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INTRODUCTION 

We  are  continuing  to  apply  a  comprehensive  but  focused  structural  genomics  approach  to 
determine  the  atomic  resolution  crystal  structures  of  key  virulence  factors  from  high 
priority  pathogens.  The  work  in  our  first  year  focused  on  proteins  encoded  by  the  B. 
anthracis  virulence  plasmid,  pXOl,  and  the  setting  up  of  a  virulence  factor  computational 
data  base.  In  the  second  year  we  expanded  our  efforts  to  include  genome-encoded 
proteins  of  B.  anthracis,  structural  studies  on  proteins  encoded  by  Variola  virus,  the 
causative  agent  of  smallpox;  initiated  work  to  characterize  a  SARS  virus  surface  protein 
in  complex  with  a  neutralizing  antibody;  and  inititated  work  on  a  close  homolog  of  a 
Yersinia  pestis  SUMOylase.  We  have  generated  a  large  library  of  expression  vectors  for 
virulence  factors,  as  well  as  research  quantities  of  pure  proteins,  which  could  readily  be 
adapted  for  vaccine  design.  In  the  broader  and  longer  term,  the  accumulated  structural 
information  will  generate  important  and  testable  hypotheses  that  will  increase  our 
understanding  of  the  molecular  mechanisms  of  pathogenicity,  putting  us  in  a  stronger 
position  to  anticipate  and  react  to  emerging  pathogens. 

BODY 

Task  1;  Atomic  resolution  crystal  structures  of  virulence  factors: 

Cloning  and  expression  of  novel  B.  anthracis  proteins 

Expression  of  selected  genes  from  the  B.  anthraces  plasmid  PXOl  in  Bacillus  subtilis 
and  Bacillus  megaterium  cells. 

We  have  continued  our  work  on  the  expression  and  purification  proteins  encoded  by  the 
pXOl  plasmid  and  selected  by  our  bioinformatics  approached  (summarized  below,  see 
Task  2).  Our  hit  rate  on  soluble  protein  expression  and  crystallization  has  been 
disappointing  when  compared  with  our  general  success-rate  for  other  bacterial  and 
eukaryotic  proteins.  We  therefore  investigated  Bacillus  expression  systems  to  see  if  these 
would  provide  a  superior  system  for  expressing  B.  anthracis  proteins. 

Though  Bacillus  strains  are  broadly  used  for  industrial  expression  of  heterologous 
proteins,  there  is  only  one  company  that  sells  the  expression  system.  Furthermore,  their 
shuttle  plasmid  is  underdeveloped  -  it  does  not  have  purification  tags  and  secretion 
peptides.  There  are  also  numerous  Bacillus  subtilis  strains  and  plasmids,  but  they  have 
been  used  mostly  for  functional  studies,  where  overexpression  of  a  protein  is  not 
important.  We  tested  two  systems.  Bacillus  subtilis  and  Bacillus  megaterium.  Derivatives 
of  Bacillus  subtilis  strain  168  (1A436,  S53,  1  Al)  and  the  plasmid  pDG148  were  obtained 
from  the  Bacillus  Genetic  Stock  Center  (Ohio  University).  Bacillus  megaterium  strain 
WH320  and  the  plasmid  pWH1520  were  obtained  from  MoBiTec  (Germany).  Bacillus 
subtilis  strain  168  has  a  natural  ability  for  transformation  (intake  of  plasmid  DNA 
through  the  cell  wall).  The  protein  expression,  however,  is  problematic,  because  this 
strain  undergoes  sporulation  when  the  expressed  protein  is  toxic  or  the  growth  conditions 
are  not  optimal.  The  value  of  this  system  for  secreted  expression  is  also  limited,  because 
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B.  suhtilis  produces  too  many  proteases.  B.  megaterium  strain  WH320  does  not  sporulate, 
the  shuttle  plasmid  is  fairly  stable  there,  and  it  not  secrete  many  proteases.  However,  B. 
megaterium  does  not  take  plasmids  by  transformation.  The  alternative  protocol,  which 
requires  removal  of  the  cell  wall  by  lysozyme,  is  unreliable. 

We  successfully  adopted  the  two  Bacillus  expression  systems  and  tested  expression  of 
following  genes  pxOl-97,  pxOl-99,  pxOI-1 18,  pxOl-1 19,  pxOl-125  which  did  not  express 
well  in  E.  coli.  Gene  pXOl-1 18,  which  expressed  well  in  E.  coli,  was  used  as  a  positive 
control.  We  found  that  the  level  of  protein  expression  in  correlates  with  the  level  of 
expression  in  E.  coli.  The  best  results  were  obtained  for  pxOl-1 18  using  B.  megaterium] 
nevertheless,  the  expression  level  per  gram  of  cell  mass  was  about  0.5-2  mg,  which  is  5 
times  lower  then  the  expression  from  pET  plasmid  in  E.coli.  The  expression  of  other 
proteins  as  soluble  proteins  was  detectable  by  Western  blot  against  His-tag,  but 
insufficient  for  crystallization.  The  expression  of  pXOl-118  in  B.  subtilis  strains  was 
unstable.  Often  cells  began  to  sporulate  even  before  induction  of  protein  expression  (the 
IPTG  promoter  was  very  leaky).  We  tested  the  plasmid  PDG148  with  B.  megaterium  and 
the  plasmid  pWH1520  with  B.  suhtilis.  Contrary  to  the  claims  of  MoBiTec,  the  plasmids 
did  not  perform  well  in  foreign  cells. 

We  conclude  that  intracellular  expression  in  Bacillus  species  does  not  give  a  clear 
advantage  over  E.  coli  system,  perhaps  because  the  codon  usage  is  similar  and  E.  coli  has 
a  more  developed  chaperoning  system.  However,  B.  megaterium  may  be  beneficial  for 
expression  of  secreted  proteins. 

Expression  and  purification  of  AtxA  and  its  homologs  on  pX02,  AcpA  and  AcpB 

Full-length  AtxA  was  expressed  with  or  without  a  histidine-tag  fusion  and  purified  by  Ni 
affinity,  heparine-sepharose  and/or  anion-exchange,  and  gel  filtration  chromatography. 
Yields  are  around  2  mg/liter  of  cell  culture.  The  presence  of  up  to  five  species,  partially 
separable  by  heparin-sepharose  affinity  chromatography,  was  evident.  Native  PAGE 
evidence  at  mM  to  mM  concentration  shows  that  AtxA  interacts  with  DNA,  as  a  band 
corresponding  to  DNA  cannot  be  detected  as  the  concentration  of  AtxA  increases,  but  a 
stable  specific  complex  could  not  yet  be  characterized,  possibly  due  to  the  relatively  high 
concentration  of  protein  or  the  lack  of  a  specific  site  on  the  DNA  sequence  used,  a  300  bp 
stretch  upstream  of  the  transcriptional  start  site  of  the  pag  gene.  Current  work  includes 
further  separation  of  the  above  mentioned  AtxA  species,  determining  whether  they  are 
stable  or  in  slow  exchange  with  each  other,  and  whether  this  affects  binding  to  DNA. 
Near-future  plans  are  the  characterization  of  the  binding  to  DNA  sequences  from  the 
promoters  of  other  AtxA-regulated  genes  using  radioactively  labelled  DNA,  which  will 
allow  work  at  or  near  the  protein-DNA  dissociation  constant,  which  is  as  yet 
undetermined  but  usually  expected  in  the  nM  range. 

AcpB  waas  expressed  as  a  histidine-tag  fusion  and  purified  with  similar  results.  AcpA 
appears  to  be  toxic  to  E.coli  cells  as  their  growth  is  significantly  slowed  down  when 
transformed  with  a  plasmid  encoding  the  histidine-tagged  protein,  and  yields  were 
therefore  an  order  of  magnitude  lower.  Current  work  focuses  on  the  cloning,  expression 
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and  purification  of  native  (untagged)  AtxA  and  AtxB  and  future  plans  will  include  the 
characterization  of  their  binding  to  DNA,  similar  to  AtxA. 

Structural  Studies  of  inhibitor  binding  to  Lethal  Factor 

Our  work  to  determine  LF-inhibitor  complexes  in  collaboration  with  the  Bavari  and 
Gussio  groups  at  USAMRIID  and  NCI  continues.  The  crystal  structure  of  full-length  LF 
was  grown  under  high  salt  conditions,  and  this  may  have  hampered  in  several  cases  the 
determination  of  high  quality  inhibitor  complexes.  To  try  to  overcome  these  problems 
we  have  cloned,  expressed  and  crystallized  a  fragment  of  LF  that  lacks  domain  l(the  PA- 
binding  domain),  but  that  contains  the  critical  catalytic  module  (Domains  2-4).  This 
protein  expresses  readily  in  E.  coli,  and  crystallizes  from  low  salt  (PEG)  conditions;  it 
also  diffracts  X-rays  to  high  resolution.  We  are  now  in  the  process  of  repeating  our 
inhibitors  soaks  and  co-crystallization  experiments  under  these  low  salt  conditions. 

Crystal  structure  of  an  anthrax  toxin-host  cell  receptor  complex 

Two  closely  related  host  cell  receptor  molecules,  TEM8  and  CMG2,  bind  to  PA  with 
high  affinity  and  are  required  for  toxicity.  We  determined  the  crystal  structure  of  the  PA- 
CMG2  complex  at  2.5  A  resolution  (published  in  Nature,  see  Appendix  1).  The  structure 
reveals  an  extensive  receptor-pathogen  interaction  surface  that  mimics  the  non- 
pathogenic  recognition  of  the  extracellular  matrix  by  integrins.  The  binding  surface  is 
closely  conserved  in  the  two  receptors  and  across  species,  but  quite  different  in  the 
integrin  domains,  explaining  the  specificity  of  the  interaction.  CMG2  engages  two 
domains  of  PA,  and  modeling  of  the  receptor-bound  PA63  heptamer  suggests  that  the 
receptor  acts  as  a  pH-sensitive  chaperone  to  ensure  accurate  and  timely  membrane 
insertion. 

Structural  studies  on  a  B.  anthracis  epimerase  involved  in  Ivsine  biosynthesis 

Lysine  biosynthesis  in  bacteria  provides  the  essential  components  both  for  L-lysine  for 
protein  synthesis  and  meso-diaminopimelate  for  construction  of  the  bacterial 
peptideglycan  cell  wall.  Since  lysine  biosynthesis  is  deficient  to  mammals  and  unique  to 
bacteria,  the  enzymes  involved  in  the  pathway  may  be  useful  for  antibiotic  design.  Recent 
genome  sequence  analysis  of  B.  anthracis  revealed  the  complete  sequences  of  enzymes 
involved  in  lysine  biosynthesis.  Diaminopimelate  epimerase  (E.C.5. 1.1.7),  an  enzyme 
involved  in  the  pathway,  catalyzes  the  racemization  of  L,L-  to  D,L-meso- 
diaminopimelate,  the  immediate  precursor  of  L-lysine  in  B.  anthracis.  Several  enzymes 
involved  in  racemization  require  pyridoxal  5 ’-phosphate  (PLP)  as  cofactor  for  their 
activity;  however,  little  is  known  about  the  structural  basis  of  PLP  dependence  for  the 
activity  of  anthrax  diaminopimelate  epimerase.  The  object  in  this  study  is  therefore  to 
determine  the  crystal  structure  of  the  anthrax  diaminopimelate  epimerase  to  investigate 
the  structure/function  correlation. 

1.  Cloning,  expression  and  Purification:  The  gene  encoding  diaminopimelate 
epimerase  (EP)  from  B.  anthracis  (gene  code:  BA5170;  MW=32  kDa;  288  residues)  was 
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cloned  from  genomic  DNA  and  inserted  into  pET15b  at  sites  of  Bam  \{\INde  I.  The 
recombinant  protein  was  not  expressed  in  standard  E.coli  BL21  (DE3)  even  under  several 
different  growth  conditions  using  LB/TB  medium,  different  IPTG  concentration,  and 
high/low  temperature.  The  sequence  analysis  revealed  that  many  numbers  of  rare  codons 
are  involved  in  the  protein  sequence,  suggesting  incomplete  translation  of  the  sequence 
during  the  protein  synthesis  in  the  BL21  (DE3)  strain,  consistent  with  non-production  of 
the  recombinant  protein  in  the  bacteria.  Alternatively,  another  E.coli  strain,  Rosetta 
(DE3)pLysS,  dramatically  increased  the  expression  of  soluble  recombinant  EP  protein, 
using  2X  YT  medium  at  37"C. 

Large  scale  protein  expression  was  carried  out  of  the  Rosetta  (DE3)pLysS  bacterial 
cultures  in  2X  YT  medium.  The  recombinant  EP  protein  was  purified  from  the  cell-free- 
extract  by  Ni-affinity  chromatography  followed  by  gel  filtration  chromatography 
(Superdex200).  The  MALDI-MS  analysis  of  the  purified  protein  revealed  a  strong  single 
peak  at  the  expected  molecular  mass  (33.6  kDa).  The  gel  filtration  experiment,  however, 
revealed  the  recombinant  EP  protein  eluted  at  around  the  molecular  mass  of  >60  kDa, 
suggesting  it  forms  a  dimer  in  solution.  The  purification  protocol  presented  above 
typically  yielded  >70  mg  of  the  purified  EP  protein  per  1  liter  of  the  bacterial  cultures.  5 
mM  DTT  was  always  included  in  the  running  buffer  used  for  gel  filtration  experiment. 
His-tag  has  NOT  been  cleaved. 

Crystallographic  characterization. 

Crystals  of  EP  were  grown  in  several  reservoir  conditions  using  commercial  screening 
kits  (Table  ).  The  best  crystals  were  obtained  in  the  reservoir  of  0.1  M  Na/K-phosphate, 
pH  6.6,  20%  PEG3,350,  and  0.2  M  ammonium  formate.  The  protein  concentration  used 
was  13  mg/mL,  in  20  mM  Tris,  pH  8,  150  mM  NaCl,  and  5  mM  DTT.  The  crystals 
diffracted  to  ~2.5  A  resolution  using  Rigaku  FR-E  X-ray  generator.  The  crystals  belong 
to  space  group  P2|2|2|,  with  cell  dimensions  a=64.9,  b=85.5,  c=113.0  A.  The  crystal 
structure  has  recently  been  solved  using  Molecular  Replacement,  and  refinement  and 
inhibitor  design  is  in  progress. 

Table.  Crystallographic  characterization  of  the  native  EP  crystal  (sample:  EP02) 
/data/liddingA/koichi/Anthrax/EP2/scale.log 


Parameters 

Space  group 

P2,2,2, 

Cell  dimensions  (A) 

a=64.9,  b=85.5,c=113.0 

Resolution  range  (A) 

30-2.7  (2.8-2.7) 

No.  of  observed  reflections 

50105 

No.  of  unique  reflections 

16106(1298) 

Completeness  (%) 

90.0  (74.7) 

D 

•^meroe 

0.088  (0.288) 

II  si 

13.8  (2.7) 

V.  (AVDa) 

2.4 

No.  of  molecules  per  asym 

2 

Solvent  content  (%) 

49 
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B.  anthracis  endolysins  studies  (manuscript  submitted;  see  Appendix  2) 

Endolysins  are  cell  wall  dissolving  enzymes  used  by  phage  to  lyse  its  host  to 
release  its  progeny,  and  are  potential  antibacterial  agents.  The  aim  of  this  study  is  to 
examine  if  the  integrated  copies  of  prophage  endolysins  within  the  B.  anthracis  Stern 
strain  can  be  used  as  anti-bacterial  agents  for  the  treatment  and  prophylaxis  of  anthrax 
and  other  Gram  positive  bacterial  infection. 

Two  targets  were  selected,  one  prophage  amidase  and  one  prophage  glycosidase, 
from  the  B.  anthracis  Stern  strain.  They  are  two-domain  proteins,  consisting  of  a  N- 
terminal  catalytic  domain  and  a  C-terminal  80  amino  acid  putative  cell-wall  binding 
domain.  The  amidase  cleaves  the  bond  between  the  N-Acetylmuramic  acid  and  the  L- 
Alanine,  while  the  glycosidase  cleaves  the  bond  between  N-acetylglucosamine  and  N- 
Acetylmuramic  acid  of  the  cell  wall.  The  C-terminal  cell  wall  binding  domain  of  the  two 
endolysin  has  very  high  sequence  homology  (68%  identity).  Although  they  were  not  in 
the  same  prophage  region,  it  is  believed  that  with  similar  cell-wall  binding  domain,  the 
two  enzymes  could  be  working  together  synergistically. 

Both  proteins  can  be  expressed  in  E.  coli  system  at  higher  than  20  mg/L  culture. 
They  can  be  purified  easily  by  standard  techniques,  but  full  length  proteins  were  less 
soluble  than  the  catalytic  domains.  Crystallization  trials  were  set  up  for  all  constructs.  It 
was  found  that  the  full  length  proteins  precipitated  in  the  majority  of  the  screen 
conditions  even  at  concentration  lower  than  5  mg/ml.  The  catalytic  domains,  however, 
crystallized  readily.  The  sequences  of  the  endolysins  are  relative  distant  (less  than  27% 
identity)  to  any  of  the  known  structures  of  the  similar  enzyme  classes.  Molecular 
replacement  using  standard  techniques  failed  to  provide  the  phase  information.  Multiple 
Anomalous  Diffraction  (using  SelenoMethionine  labeled  protein)  and  Single 
Isomorphous  Replacement  (using  Methyl  Mercury  Nitrate  derivatized  protein  crystal) 
phasing  techniques  were  used  to  elucidate  the  structures  of  the  amidase  and  glycosidase, 
respectively.  The  highest  resolutions  of  the  catalytic  domains  were  1 .8  and  1 .4  angstroms 
for  the  amidase  and  glycosidase,  respectively.  The  anthrax  prophage  amidase  structure 
resemble  that  of  the  T7  amidase  fold.  The  prophage  glycosidase,  on  the  other  hand  adopt 
the  Chalaropsis  glycosidase  family  fold. 

They  were  shown  to  be  able  to  hydrolyse  bacterial  cell  wall  peptidoglycan,  and 
kill  several  bacillus  strains  (B.  anthracis  Stern,  B.  cereus,  B.  megaterium,  and  B.  subtilis) 
in  vitro  within  15  minutes  at  sub-micromolar  concentration.  It  was  also  found, 
surprisingly,  that  the  N-terminal  catalytic  domain  is  significantly  more  active  than  the  full 
length  protein  in  most  of  the  bacilli  strains  tested.  The  C-terminal  domain  was  later 
cloned  into  a  expression  vector  as  fusion  with  the  green  fluorescence  protein  (GFP).  The 
GFP  fused  with  the  C-terminal  domain  of  amidase  was  able  to  coat  the  surface  of  B. 
cereus  but  not  other  strains  (B.  anthracis  not  yet  tested).  These  results  suggest  that  the  C- 
terminal  domain  of  the  amidase  could  be  a  negative  regulator,  and  also  at  the  same  time 
provide  selectivity  for  cell-wall  binding.  The  cell-wall  binding  domain  of  the  amidase 
was  also  crystallized  and  its  structure  determination  is  underway. 

In  conclusion,  we  have  determined  the  minimum  endolysin  protein  constructs  as 
potential  candidates  to  use  for  anti-bacterial  treatment.  These  constructs  are  likely  to  be 
more  active  than  the  full  length  protein  including  the  full  length  PlyG,  because  of  the 
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absence  of  specific  cell  wall  binding  domain  that  may  be  inhibiting  the  activity  of  the 
catalytic  domain  when  use  against  non-specific  host  strains.  This  minimum  catalytic 
domains  will  be  tested  on  other  Gram  positive  bacteria  strains  in  the  near  future,  as  soon 
as  they  become  available. 

Collagen  binding  protein  BA5258  of  B.  anthracis 

B.  anthracis,  similar  to  other  Gram  positive  bacteria,  attaches  to  the  host  via  cell- 
wall-anchoring  proteins.  Two  of  such  protein  from  B.  anthracis  were  characterized  by  Xu 
et  al  (2004),  namely  BA0871  and  BA5258.  These  two  proteins  have  sequence  homology 
to  CNA,  a  cell  wall-anchored  collagen  adhesin  of  S.  aureus.  The  full  length  BA5258, 
excluding  the  leader  sequence,  has  been  cloned  into  a  E.  coli  expression  vector.  It  can  be 
expressed  and  purified  to  a  final  yield  of  10  mg/L  culture.  The  protein  is  extremely 
soluble  and  resistant  to  limited  proteolysis  with  trypsin,  elastase,  and  chymotrypsin. 
Crystallization  trials  of  the  protein  by  itself  and  with  a  collagen  peptide  are  in  progress, 
and  small  but  promising  protein  crystals  have  been  obtained. 

Structural  studies  of  the  SARS  SI  (spike  protein)  and  its  complex  with  a  high 
affinity  anitbody. 

In  collaboration  with  Dr,  Wayne  Marasco,  Dana  Farber  Cancer  Institute,  Boston, 
we  have  initiated  a  structural  study  of  the  SARS  SI  spike  protein  with  a  high  affinity 
antibody  (“80R)”  (Sui  et  al.,  2004).  Both  the  SI  protein  and  antibody  have  been 
expressed  and  purified  in  milligram  quanitities,  and  the  SI  protein  alone  and  in  complex 
with  the  antibody  (figure  below).  X-ray  data  sets  have  been  collected  to  2.3  A  resolution 
and  the  structure  determination  is  in  progress. 
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Structural  studies  of  Variola  proteins 

A  highly  conserved  poxvirus  protein,  NIL  (Vaccinia  gene  name),  with  no 
significant  homology  to  any  non-poxvirus  proteins,  was  recently  shown  to  be  a  viral 
inhibitor  of  the  host  innate  immunity  (DiPderna  et  ah,  2004).  NIL  is  a  small  14kDa 
protein,  highly  conserved  among  poxviruses,  with  94%  sequence  identity  between 
Vaccinia  and  Variola  orthologs.  NIL  is  considered  one  of  the  most  potent  virulent 
factors  based  on  the  attenuated  phenotype  of  the  recombinant  mutant  Vaccinia  virus 
(Kotwal  et  ah,  1989).  NIL  associates  with  several  kinases  within  the  multi-subunit  IKK 
complex,  NIL  interacted  most  strongly  with  the  TANK-binding  kinase  1  (TBKl).  The 
NIL  gene,  amplified  from  genomic  DNA  of  Vaccinia  Western  Reserve  and  Cowpox 
Brighton  Red  (a  gift  from  Dr.  D.J.  Pickup,  Duke  University).  We  have  successfully 
produced  homogeneous  samples  of  Vaccinia  NIL,  judged  by  SDSPAGE  (Fig.  A).  We 
have  also  obtained  small  needle  crystals  of  His-tagged  NIL  (Fig.  B).  Structural  analysis 
is  underway. 


A 
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Figure.  Purified  His-tagged  and  non-tagged 
Vaccinia  N1L  analyzed  by  SDSPAGE  (A)  and 
preliminary  small,  needle  crystals  of  N1L  (B). 

NMR  based  structural  characterization  of  virulence  factors. 

The  overall  goals  of  Dr.  Pellecchia’s  laboratory  within  this  project  are  to  provide  support 
for  the  determination  of  the  structures  of  key  virulence  factors  by  NMR  spectroscopy.  In 
particular.  Dr.  Pellecchia  focused  on  the  identification,  expression  and  purification  of 
novel  virulence  factors  for  subsequent  NMR  analysis.  A  group  of  bacterial  genes 
homologous  to  the  human  Ubiquitin-like  protease  (Ulp)  or  SUMO-specific  protease 
(SUMOylase)  have  been  identified  by  bioinformatics  methods  in  Dr.  Godzik’s 
laboratory.  These  proteins  are  also  related  to  the  Yersinia  virulence  factor  YopP.  Dr. 
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Pellecchia  focused  his  efforts  on  a  particular  protein  construct  from  Salmonella 
typhimurium  called  Virulase  ST.  In  unpublished  work,  Dr.  Reed’s  laboratory  has 
established  that  much  like  YopP,  Virulase  ST  regulates  apoptosis  and  inflammation  in 
infected  host  cells,  presumably  via  the  NK-kB  pathway.  In  order  to  provide  additional 
insights  into  the  function  and  role  of  this  protein  in  the  onset  and  propagation  of  the 
infection.  Dr.  Pellecchia  begun  to  investigate  protein  constructs  from  this  family  of 
proteins  for  subsequent  structural  analysis  by  NMR.  Because  the  putative  catalytic 
domain  of  Virulase  ST  has  been  shown  to  induce  apoptosis  (measured  by  caspase-3 
activation)  in  transfected  human  cells,  the  studies  focused  on  this  domain.  Recombinant 
Virulase  ST  (145-326)  was  produced  from  a  pET-19b  (Novagen)  plasmid  construct 
containing  the  nucleotide  sequence  for  the  catalytic  domain  fused  to  an  V-terminal  poly- 
His  tag.  Unlabeled  protein  was  expressed  in  E.  coli  BL21  in  LB  media  at  37°C,  with  an 
induction  period  of  3-4  hours  with  1  mM  IPTG.  ''^N-labeled  protein  was  similarly 
produced,  with  growth  occurring  in  M9  media  supplemented  with  0.5  g/L  '^NH4C1. 
Double  '^C/'’N-labeled  protein  as  well  as  triple  labeled  ■H/'‘’N/'’C  protein  were  similarly 
produced  in  M9  media  supplemented  with  ‘‘^C-glucose  (2  g/L)  and  “H^O  (70%), 
respectively.  Following  cell  lysis,  soluble  protein  is  purified  over  a  Hi-Trap  chelating 
column  (Amersham,  Pharmacia).  Final  protein  samples  were  dialyzed  into  a  buffer 
appropriate  for  the  subsequent  experiments.  A  number  of  stability  tests  have  subsequently 
been  performed  in  order  to  verify  that  the  protein  would  survive  the  time  needed  to 
collect  the  NMR  experiments  for  resonance  assignments  (2-3  weeks).  Unfortunately,  the 
protein  is  not  long  lived  and  it  tends  to  aggregate  very  rapidly  (hours)  or  gets  cleaved.  In 
order  to  increase  the  stability  of  the  catalytic  domain  of  Virulase  ST  number  of  different 
conditions  were  tested  including  temperature,  pH,  different  detergents  (TRITON  and 
NP-40,  both  at  0.1%)  and  salts.  Conditions  that  lead  to  samples  that  are  stable  for  ~  3-7 
days,  included  a  second  step  purification  (ion-exchange  purification  with  a  MonoQ 
(Amersham,  Pharmacia)  column),  pH  =  7.2,  100  mM  NaCl  and  50  mM  each  of  arginine 
and  glutammic  acid.  Because  3-7  days  is  still  to  short  lived  for  a  complete  set  of  NMR 
experiments  to  be  collected,  several  samples  were  finally  prepared.  2D  [’H,'^N]  HSQC 
and  TROSY-type  experiments  were  subsequently  carried  out  on  a  600  MHz  spectrometer 
at  20  °C  and  30°C.  A  typical  2D  ['^N,  ’H]  HSQC  spectrum  of  the  resulting  protein 
sample  is  reported  in  Figure  1  A.  The  number  of  peaks  and  the  dispersion  are  indicative  of 
a  folded  monomeric  protein.  Chemical  shift  dispersion  in  the  '^C“  (and  '^C'’)  from  initial 
triple  resonance  experiments  (Figure  IB)  suggests  a  mixed  a/b  secondary  structure, 
although  there  is  probably  a  flexible  region  as  well.  Therefore,  while  additional  work  is 
needed  to  complete  the  acquisition  of  a  minimal  data  set  for  structural  determination, 
samples  that  appear  well  versed  for  high  resolution  studies  have  been  obtained.  The 
isotopically  labeled  samples  and  the  preliminary  NMR  data  collected  lay  the  foundation 
for  a  detailed  structure  determination  project,  for  which  funds  are  being  sought 
elsewhere.  Much  in  line  with  the  objectives  of  this  project,  the  work  initiated  here  will 
shed  additional  light  on  the  function  of  this  class  of  essential  virulence  factors  and  will 
represent  the  starting  point  for  inhibitor  design  and  subsequent  target  validation  studies. 
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Figure  1.  A)  2D  ["'H.  ''-N]  HSQC  spectrum  of  the  catalytic  domain  of 
Virulase  ST  acquired  on  a  1  mM  sample  in  phosphate  buffer,  pH  = 
7.2,  50  mM  each  Arg/Glu,  100  mM  NaCI.  The  spectrum  was  acquired 
with  ns=16  at  20”  C  on  a  600  MHz  Avance  Bruker  instrument.  B) 
Typical  strips  taken  at  different  '-N  chemical  shifts  from  a  3D 

HNCA  experiment  measured  with  a  triple  ^h/’^C/'^N  labeled  sample. 
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Task  2:  Collect  expression  vectors  and  purified  proteins  into  a  library 
suitable  for  use  bv  other  interested  groups,  and  post  the  information  on  our 
website. 


This  task  is  ongoing  for  B.  anthracis  and  other  Class  A  pathogens;  target  selection  and 
experimental  updates  are  done  on  a  monthly  basis  in  the  light  of  new  cloning,  expression 
and  structural  data.  The  current  status  is  summarized  below.  We  will  make  this 
information  publicly  available  if  this  is  deemed  appropriate  by  USAMRMC. 

Summary  of  cloning,  expression  and  purification  of  novel  pXOl  proteins: 

pXOl-37  (Acetyltransferase)  His  tagged  full-length  pXOl-37  (1-193)  was  solubly 
overexpressed  by  E.  coli  at  30°C.  Previous  instability  problem  upon  concentrating  to 
higher  concentration  is  solved  by  adding  100  mM  DTT  to  the  protein  solution  after  Ni- 
column  purification.  Crystallization  setups  have  begun 

pXOl-47  (Transcription  Activator  of  multidrug-efflux)  His  tagged  full-length  pXOl-47 
(1-201)  was  overexpressed  in  inclusion  bodies  .  Varying  expression  conditions  did  not 
lead  to  soluble  protein.  pXOl-47  was  purified  under  denatured  condition  by  Ni-column 
and  refolded  as  soluble  protein.  DSC  experiment  is  underway  to  demonstrate  correct 
folding. 

pXOl-87  and  pXOl-99  were  expressed,  but  proved  to  be  difficult  to  purify.  Both 
proteins  were  co-purified  with  a  60  kDa  protein,  which  is  suspected  to  be  a  heat  shock 
protein  or  chaperonin.  High  resolution  columns,  superdex200HR  gel  filtration,  monoS 


Page  12 


Principal  Investigator:  Liddington,  Robert  C. 


and  monoQ  column  could  not  separate  the  contaminants.  Mg^'*^-ATP  has  been  shown  to 
enhance  dissociation  of  E.  coli  chaperonin  from  proteins  with  large  hydrophobic  surface 
area  exposed.  It  will  be  used  in  the  immediate  future  for  the  pXOl-99  and  87  protein 
purification. 

pXOl-97  was  cloned  and  gave  soluble  protein,  and  structural  analysis  by  NMR  is  in 
progress. 

pXOl-104  His  tagged  full-length  pXOl-104  (1-61)  was  overexpressed  as  inclusion 
body.  Other  conditions  have  been  tried  to  make  it  expressed  solubly  without  success. 
Refolding  experiments  are  underway. 

pXOl-109/PagR  Cloning  and  soluble  expression;  crystallization  trials  in  progress. 

pXOl-111  (homologous  to  PA  domain  4).  Cloning  and  soluble  expression; 
crystallization  trials  in  progress. 

pXOl-116  Cloning  unsuccessful  so  far. 

pXOl-117  and  143  cloning  successful  but  no  expression  in  E.  coli. 

PXOl-118  (and  pX02-61)  have  been  crystallized  and  their  structures  determined  (see 

Appendix  3) 

pXOl-121  His  tagged  full-length  pXOl-121  (1-58)  was  overexpressed  as  inclusion 
body.  Other  conditions  have  been  tried  to  express  it  solubly,  without  success.  Refolding 
is  underway. 

pXOl-125  -  cloning  and  expression  successful  -  protein  is  insoluble  and  could  not  be 
refolded. 

Cloning  of  all  the  following  target  genes  as  full-length  proteins  has  been  completed,  and 
expression  trials  are  in  progress.  All  the  genes  are  now  subcloned  into  the  bacterial 
expression  vector,  pET28a: 

pXOl-96,  274  residues,  homologue  to  putative  transposase; 

pXOl-103,  317  residues,  homologue  to  site-specific  recombinase; 

pXOl-105,  67  residues,  homologue  to  regulators  of  stationary/sporulation  gene 

expression; 

pXOl-126,  151  residues,  homologue  to  uncharacterized  ACR  ML0644; 
pXOl-130,  237  residues,  predicted  periplasmic  or  secreted  protein. 

pXOl-109  (PagR)  expressed  in  E.  coli  and  purified. 

pXOl-110  (PA)  expressed  in  E.  coli  and  purified; 

604-735  (domain  IV)  expressed,  partially  purified 
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597-735  (domain  IV)  expressed,  purified 
588-735  (domain  IV)  expressed,  partially  purified 

pXOl-107  (LF)  expressed  in  E.  coli  and  purified;  catalytic  mutants  E687C  and 
E786A  expressed  and  purified. 

263-776  (domains  II-IV)  expressed,  purified  and  crystallized 

pXOl-119  (AtxA)  full-length  and  1-393  expressed  and  purified; 

1-141  and  1-160  (putative  DNA  binding  domain)  expressed,  insoluble; 

141-475,  162-475,  141-393,  162-393  (putative  regulatory  domains); 

388-475  expressed,  soluble,  precipitates  during  purification 

pXOl-138  (PagR  homolog)  expressed,  soluble 


pX02-53  (AcpB)  expressed  and  purified 

pX02-64  (AcpA)  expressed  and  purified  (low  yield  «  1  mg/1) 

The  following  gene  products  of  unknown  function  have  been  cloned  expressed  and 
purified:  pXOl-04,  pXOl-07,  pXOl-10,  pXOl-32,  pXOl-90,  pXOl-94,  pXOl-98,  a 
truncated  form  of  pXOl-98,  pXOl-117,  pXOM24,  pXOl-127,  and  pXOl-132. 


pXOl-1,  pXOl-15,  pXOl-125,  pXOl-117,  pXOl-128  and  pXOM43  were  expressed 
in  E.  coli  as  insoluble  proteins.  Refolding  with  arginine  as  refolding  buffer  solubilized  the 
proteins  but  precipitations  occurred  during  the  removal.  pXOl-87  and  pXOl-99  could 
be  purified  but  as  soluble  aggregates,  which  precipitate  at  high  concentration. 

Task  3:  Develop  a  computational  database  of  virulence-related  genes 

Bioinformatics  and  Target  Selection.  The  main  focus  of  the  bioinformatics  part  of  the 
grant  is  the  development  of  an  annotated  collection  of  virulence  factors.  To  this  end  we 
developed  the  VirFact  database  (http://virfact.burnham.org),  which  contains  information 
on  microbial  virulence  factors  and  pathogenicity  islands  (PAIs)  from  major  pathogens. 
The  database  initially  contained  information  manually  collected  from  literature,  and  then 
combined  this  with  results  obtained  by  genome  context  analysis  and  distant  homology 
recognition.  The  database  can  be  browsed  by  virulence  factor,  PAI  or  organism  name. 
The  annotations,  including  multiple  alignments  of  proteins  homologous  to  virulence 
factors,  genomic  context,  models  of  three  dimensional  structures  (if  available)  are 
presented  using  graphical  web  interface  and  standard  visualization  tools.  The  VirFact  can 
also  be  used  as  a  tool  to  recognize  the  presence  of  homologs  of  known  virulence  factors 
in  the  genome  delivered  by  the  user.  For  instance  application  of  VirFact  to  Francisella 
tularensis  genome  allowed  us  to  recognize  over  50  known  virulence  factors  in  this 
genome. 
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We  also  used  several  of  the  annotation  tools  developed  in  our  group  for  a  detailed 
analysis  of  anthrax  virulence  plasmids.  Using  a  combination  of  advanced  bioinformatics 
tools,  including  context  analysis,  distant  homology  and  fold  recognition,  we  have  re¬ 
annotated  the  predicted  open  reading  frames  on  the  pXOl  plasmid,  most  of  which  were 
described  as  proteins  of  unknown  function  in  previous  analyses.  Thanks  to  improved 
annotation  tools  we  significantly  enhanced  the  annotation  of  the  pXOl  plasmid,  bringing 
the  total  number  of  ORFs  with  some  level  of  functional  annotation  from  48  to  over  100. 
The  new  results  also  clearly  show  the  mosaic  nature  of  pXOl  and  give  tantalizing  hints 
about  the  origin  of  anthrax  virulence.  The  highlights  of  the  new  finding  are  two  type  IV 
secretion  system-like  clusters  present  on  the  pathogenicity  island  of  the  pXOl  plasmid,  as 
well  as  at  least  three  clusters  related  to  DNA  processing.  Similar  annotation  of  the  pX02 
plasmid  as  well  as  pathogenic  islands  of  several  bacteria  from  the  Streptococcus  group 
are  now  in  preparation. 

Key  Research  Accomplishments 

•  Development  of  the  VirFact  database  ( hi tp:// vi rfacl . burnham.org)  of  virulence  factors 

•  Successful  expression  and/or  cloning  and  of  more  than  50  proteins  and  domain 
fragments  from  the  B.  anthracis  and  other  Class  A  pathogens. 

•  Crystal  structures  and  functional  characterization  of  B.  anthracis  prophage  amidase  and 
lysozyme.  The  amidase  is  homologous  to  a  bactericidal  phage  enzyme  that  specifically 
kills  B.  anthracis. 

•  Crystallization  of  virulence  factors  from  Variola  and  SARS  virus. 

•  Crystal  structure  of  anthrax  PA  in  complex  with  its  host  receptor  (published  in  Nature). 

Reportable  Outcomes 

Published  manuscripts: 

Santelli  E.  Bankston  LA.  Leppla  SH  &  Liddington  RC.  Crystal  structure  of  a  complex 
between  anthrax  toxin  and  its  host  cell  receptor.  Nature.  2004  Aug  19;430(7002):905-8. 
Epub  2004  Jul  4. 

Manuscripts  under  review: 

Lieh  Yoon  Low,  Chen  Yang.  Marta  Perego,  Andrei  Osterman  and  Robert  Liddington 
“Structure  and  lytic  activity  of  a  Bacillus  anthracis  prophage  endolysin” 

Marcin  Grvnberg.  Iddo  Friedberg.  Marc  Robinson-Rechavi.  and  Adam  Godzik 

“Surprising  connections:  in-depth  analysis  of  the  Bacillus  anthracis  pXOl 

Plasmid” 
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Adrian  Tkacz.  Leszek  Rychlewski  and  Adam  Godzik  “VirFact:  a  relational  database  of 
virulence  factors  and  pathogenicity  islands  (PAIs)” 

Reagents  generated: 

•  Expression  vectors  for  more  than  50  virulence  factors. 

•  Atomic  coordinates  have  been  deposited  in  the  Protein  Data  Bank  for  the  Protective 
Antigen-host  cell  receptor  complex  and  are  freely  available.  Atomic  coordinates  for 
other  crystal  structures  derived  here  will  be  released  upon  publication. 

Funding  arising  from  these  studies: 

Some  of  the  work  described  here  has  led  to  a  Program  Project  grant  from  NIAID  led  by 
Dr.  Liddington  (POl  AI  55789-01).  This  proposal  was  funded,  effective  7/04. 

Our  work  on  the  inhibitors  of  anthrax  Lethal  Factor  played  a  large  part,  in  out  successful 
application  to  NIAID  to  develop  a  novel  class  of  inhibitors  using  in  silico  and  NMR- 
based  methods  combined  with  crystallography  (U19  AI56385-01  Dr.  Alex  Strongin, 
P.I.).  Our  general  approach  also  led  to  the  successful  application  for  novel  therapeutic 
treatments  of  Smallpox  (UOl  AI061139  -  P.I,  Dr.  Alex  Strongin) 

Conclusions 

In  this  second  year  of  funding  we  have  broadened  our  approach  to  (1)  carry  out  focused 
studies  on  B.  anthracis  genome-encoded  proteins  and  (2)  strutcural  studies  of  virulence 
factors  from  Variola  virus,  SARS  virus,  our  attention  on  target  selection,  protein 
expression,  purification  and  crystallization  of  proteins  encoded  by  the  Bacillus  anthracis 
pXOl  plasmid.  We  have  cloned  and  expressed  a  total  of  50  new  proteins,  and  structural 
analysis  of  several  of  these  is  underway.  Currently,  6  new  crystal  structures  are 
essentially  complete.  We  have  also  determined  the  first  crystal  structure  of  a  complex 
between  anthrax  protective  Antigen  and  its  host  cell  receptor  (published  in  Nature). 

So  what  section:  Post-exposure  therapeutics  do  not  exist  for  any  of  the  major  pathogens 
likely  to  be  used  in  biowarfare  or  bioterrorism.  Our  work  identifies  and  characterizes 
structurally  and  functionally  key  protein  “virulence  factors”  from  these  organisms, 
allowing  for  the  rational  structure-based  small  molecule  inhibitor  design  that  can  lead  to 
the  development  of  therapeutic  drugs  to  treat  anthrax,  smallpox,  plague  and  SARS. 
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Crystal  structure  of  a  complex 
between  anthrax  toxin  and 
its  host  cell  receptor 

Eugenio  Santelli',  Laurie  A.  Bankston',  Stephen  H.  Leppla' 

&  Robert  C.  Liddington' 

'Program  on  Coll  Atlhe^iion.  The  Burnham  In<litu!o.  10901  North  Tonry  Pines 
Rotiii  La  Jolla,  California  9J037.  [aS'/I 

'Microbial  Palhogenosis  Scclion.  National  Jnslitiilo  of  Allorgy  anil  Infoolioiis 
Diseases,  Nil!.  Hethesdn,  Maryland  20S92,  USA 

Anthrax  toxin  consists  of  the  proteins  protective  antigen  (PA), 
lethal  factor  (LF)  and  oedema  factor  (EF)'.  The  first  step  of  toxin 
entry  into  host  cells  is  the  recognition  by  PA  of  a  receptor  on  the 
surface  of  the  target  cell.  Subsequent  cleavage  of  receptor-bound 
PA  enables  EF  and  LF  to  bind  and  form  a  heptameric  PA^j,  pre¬ 
pore,  which  triggers  endocytosis.  Upon  acidification  of  the 
endosome,  PA,j3  forms  a  pore  that  inserts  into  the  membrane 
and  translocates  EF  and  LF  into  the  cytosoF.  Two  closely  related 
host  cell  receptors,  TEM8  and  CMG2,  have  been  identified.  Both 
bind  to  PA  with  high  affinity  and  are  capable  of  mediating 
toxicity’'"’.  Here,  we  report  the  crystal  structure  of  the  PA-CMG2 
complex  at  2.5  A  resolution.  The  structure  reveals  an  extensive 
receptor-pathogen  interaction  surface  mimicking  the  non- 
pathogcnic  recognition  of  the  extracellular  matrix  by  integrinsA 
The  binding  surface  is  closely  conserved  in  the  two  receptors  and 


Figure  1  Structure  of  the  PA-CMG2  complex.  Two  orthogonal  views  are  shown  in  ribbon 
representation,  PA  is  coloured  by  domain  (l-IV),  CMG2  is  blue;  the  metal  ion  is  shown  as  a 
magenta  ball,  PA  domain  I  is  cleaved  after  receptor  binding,  leading  to  the  loss  of  domain 
la  (yellow)  and  the  formation  of  PAk,,  All  molecular  graphics  images  were  generated  using 
the  UCSF  Chimera  package®  (http,7/wmv,cgl,ucsf,edu/chimera), 
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across  species,  but  is  quite  different  in  the  integrin  domains, 
explaining  the  specificity  of  the  interaction.  CMG2  engages  two 
domains  of  PA,  and  modelling  of  the  receptor-bound  PA^j 
heptamer’*’''  suggests  that  the  receptor  acts  as  a  pH-sensitive 
brace  to  ensure  accurate  and  timely  membrane  insertion.  The 
structure  provides  new  leads  for  the  discovery  of  anthrax  anti¬ 
toxins,  and  should  aid  the  design  of  cancer  therapeutics”. 

Both  TEM8  and  CMG2  contain  a  domain  that  is  homologous 
to  the  1  domains  of  integrin.s,  which  comprise  a  Rossmann-like 
od/fi-fold  with  a  metal-ion-dependent  adhe,sion  site  (MIDAS)  motif 
on  their  upper  surlace'".  Crystal  strucfiircs  of  the  CMG2  1  domain 
and  full-length  PA  proteins  have  previously  been  determined'’". 
The  PA  monomer  is  a  long  slender  molecule  comprising  four 
distinct  domains.  In  the  PA-CMG2  1  domain  complex,  two  of 
these  tour  domains  (II  and  IV)  pack  together  at  the  ba.se  of  PA  and 
engage  the  upper  surface  of  the  CMG2  1  domain  surrounding  the 
MID.A.S  motif  (Fig.  I),  burying  a  large  protein  surface  (1,900  A'  ), 
consistent  with  the  very  high  affinity  (sub-nanomolar  dissociation 
constant)  of  this  interaction'^  The  I  domain  adopts  the  open’ 
conformation,  typical  of  integrin-ligand  complexes’’'' ’.  PA  mimics 
the  ligand  recognition  mechanism  of  the  integrins-'  by  contributing 
an  a.spartic  acid  side  chain  that  comftletes  the  coordination  sphere 
of  the  MID.AS  magnesium  ion,  as  predicted  by  mutagenesis''''" 
(Fig.  2a,  b).  This  single  interaction  contributes  substantially  to 
binding,  as  mutation  of  the  a.spartic  acid  to  a,sparagine  comprletely 
eliminates  toxicity,  as  does  mutation  of  a  metal-coordinating 
residue  on  the  receptor. 

However,  the  MIDA.S  bond  does  not  fully  explain  the  specificity 
of  the  interaction,  as  it  docs  not  distinguish  between  CMG2  and 
integrins.  Further  .specificity  arises  from  two  additional  inter¬ 
actions.  First,  PA  domain  IV^  docks  onto  the  surlhce  of  CMG2 
adjacent  to  the  MIDAS  motif.  Domain  IV  comprises  a  (3-sandwich 
with  an  immunoglobulin-like  fold,  but  the  mode  of  binding  is  quite 
different  from  that  of  antibody-antigen  recognition.  One  of  the 
receptor  loops  (a2-a3)  emanating  from  the  MIDAS  motif  forms  a 
hydrophobic  ridge  that  inserts  into  a  groove  formed  by  one  edge  of 
the  3-sandwich  where  its  hydrophobic  core  is  exposed.  Flanking  this 
ridge-in-groove  arrangement  are  two  further  loops  from  CMG2, 
which  make  a  number  of  specific  polar  interactions  and  salt  bridges 
(Figs  3  and  4a).  Together  with  the  MIDAS  contact,  CMG2  and  PA 
domain  IV  bury  1 ,300  A^  of  surface  area,  a  value  very  similar  to  two 
integrin-ligand  interactions  that  have  affinities  in  the  sub-micro¬ 
molar  range"''’.  CMG2  and  TEM8  .share  60%  identity  in  their  I 


Figure  2The  MIDAS  motifs  of  the  PA-CMG2  complex  (a)  and  the  collagen-integrin  cx231 
complax"  (b).  Coordinating  side  chains  and  two  water  molecules  (u)  are  shown  in  ball- 
and-stick  representation.  The  metal  is  shown  in  blue.  D683  from  PA,  and  a  collagen 
glutamic  acid,  are  in  gold.  Bond  distances  to  the  metal  are  2.1  ±  0.2  A  in  both  cases.  The 
three  MIDAS  loops  (L1-L3)  are  labelled  in  a. 
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domains,  and  homology  modelling  based  on  the  CMG2  structure 
shows  that  this  ridge  is  well  conserved  in  TUM8  and  their  murine 
counterparts,  implying  that  they  will  bind  PA  in  a  similar  fashion; 
however,  the  structure  and  sec]uence  of  the  ridge  are  very  different  in 
integrins,  explaining  their  weak  binding  (Fig.  4b). 

'fhe  interaction  between  PA  domain  II  and  (1MG2  was  not 
anticipated.  A  (f-hairpin  from  a  wcll-oidered  loop  ((j,^-34)  at  the 
bottom  of  domain  II  inserts  into  a  pocket  on  the  receptor,  burying 
600  A^  of  protein  surfoce  (Fig.  4b,  c).  This  additional  contact  may 
explain  the  very  high  affinity  of  the  PA-CMG2  interaction.  The 
pocket  is  adjacent  to  the  MIDAS  motif  and  is  formed  by  two 
expo.sed  tyrosine  residues  (Y119  and  Y158)  and  the  34-oi4  loop, 
which  line  the  sides  of  the  pocket,  and  by  a  histidine  (H121)  at  its 
base.  1’he  pocket  is  conserved  in  TEM8,  but  does  not  exist  in  the  I 
domains  of  integrins,  thus  providing  further  .specificity  {Fig.  4b.  c). 
The  importance  of  this  loop  was  shown  by  systematic  mutation  of 
the  Py\  molecule,  which  revealed  tliree  mutations  in  this  loop  that 
reduced  toxicity  by  >  100-fold,  including  G.T12  at  the  tip  of  the 
3-hairpin  that  in.serts  into  the  pocket"'. 

Biophysical  studies  of  channel  conductance  by  PA,i3  pores  indi¬ 
cate  that  the  entire  region  encompassed  by  re.sidues  275-332 
(strands  32  and  33  and  Hanking  loops;  see  Fig.  3)  in  domain  II 
rearranges  to  form  a  long  3-hairpin  that  lines  the  channel  lumen'■^ 
This  requires  that  the  32  and  33  strands  and  the  33-34  loop  peel 
away  from  the  side  of  domain  II.  For  this  to  happen,  domain  IV, 
which  packs  against  them  in  the  pre-pore,  must  separate  at  least 
transiently  from  domain  11.  Thus,  by  binding  to  both  domains  II 
and  IV,  CMG2  may  restrain  the  conformational  changes  that  lead  to 
membrane  insertion.  Indeed,  whereas  PA(,.i  heptamers  insert  into 


figure  3  Intermolecular  contacts  between  PA  domains  II  and  IV  and  CMG2.  Contacting 
regions  are  coloured  blue  and  green  for  CMG2  and  PA  domain  IV,  respectively.  The 
32-g3  loop  and  flanking  regions  of  PA  domain  II,  which  are  implicated  in  pore  formation, 
are  highlighted  in  red.  The  f!2-(33  loop  is  disordered  in  monomeric  PA  and  is  shown 
scliemalically  as  a  dashed  line.  The  hisfidine  residues  within  PA  domains  II  and  IV  and 
within  the  CMG2  I  domain  are  shown  coloured  cyan  and  are  in  ball-and-stick 
representation.  Mutation  sites  that  reduce  binding  by  >  100-fold  (D683,  S337,  G342, 
W346, 1656,  N657, 1665,  Y681,  N682,  P686,  L687)  are  highlighted  in  gold. 


artificial  planar  bilayers  (in  the  absence  of  receptor)  when  the  plT  is 
reduced  to  6.5,  the  pH  requirement  for  receptor-mediated  insertion 
on  cells  is  more  stringent,  requiring  a  pH  of  5.5  (ref.  17).  Thus,  we 
propose  that  the  binding  of  CMG2  to  the  33-34  loop  stabilizes  the 
pre-pore  conformation  at  neutral  pH;  that  is,  the  receptor  may  act  as 
a  brace  to  prevent  premature  membrane  in.sertion  on  the  cell  surface 
before  endocyto.si.s.  The  pH  profile  of  membrane  insertion  is 
consistent  with  the  titration  of  histidine  residues,  and  seven  of  the 
nine  histidines  within  PA,.,.;,  clu.ster  at  the  domain  II-IV  interface 
(Fig.  3).  In  addition,  the  histidine  at  the  base  of  the  CMG2  pocket 


figure  4  Key  elements  of  the  PA-CMG2  Interaction  a.  Solvent-accessible  surface  of  the 
PA  domain  IV  groove,  with  key  side  chains  from  three  CMG2  loops  (fil-or  1 ,  blue;  32-33, 
red;  c(2-ol3,  green)  shown  in  ball-and-stick  representation.  The  a2-o:3  loop  forms  the 
ridge.  The  MIDAS  metal  is  labelled  (M).  b,  Comparison  with  integrin  I  domains  in  the 
'open'  conformation  (CMG2,  red;  aM,  cyan;  a2,  green;  od..,  blue)  overlaid  on  the  MIDAS 
motif,  c.  Surface  of  the  CMG2  pocket  into  which  the  PA  33-34  loop  (red  ribbon)  inserts, 
formed  by  three  CMG2  side  chains  (shown  in  ball-and-stick  representation)  and  the 
34-ct4  loop  (cyan). 
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Figure  5  Hypothetical  model  of  the  receptor-bound,  membrane-inserled  PA  pore.  The 
model  is  batted  on  the  pre-pore  PAca  crystal  structure'’,  channel  conductance  studies', 
and  tlie  crystal  structure  of  «-haemolysin"’.  The  barrel  is  formed  by  rearrangement  in 
eacli  monomer  of  the  segment  shown  in  red  in  Fig.  3,  Each  PAfa  mononrer  is  shown  in  a 
different  colour.  Residues  303-3P'l  form  the  membrane-spanning  region  of  the  barrel. 
Seven  copies  of  the  CMG2  I  domain  bound  to  tlie  lieptanier  are  in  blue,  Ttie  -dO  A  ga() 
between  the  CMG2  I  domain  and  the  membrane  may  be  occupied  by  a  —100-residue 
domain  of  CMG2,  C-terminal  to  the  I  domain,  wliicli  precedes  its  membrane-spanning 
sequence. 


(con.serv'ed  in  TEM8)  has  no  H-bonding  partners,  and  is  close  to  an 
arginine  side  chain  from  the  33-(34  loop  of  PA.  Histidine  protona¬ 
tion  provides  a  plausible  trigger  for  the  release  of  domain  II  from 
CMG2  in  the  acidified  endosome.  Indeed,  wc  have  shown  that  the 
structure  of  the  (33”34  loop  is  pH-sensitive,  as  it  becomes  dis¬ 
ordered  when  crystals  of  PA  grown  at  pH  7.5  (in  the  absence  of 
I’eceptor)  are  reduced  to  pH  6.0  (ref.  18). 

It  is  straightforward  to  model  the  7:7  heptameric  PAf,3-CMG2 
complex,  as  the  crystal  structure  of  the  pre-pore  is  known*’  (Fig.  5). 
Seven  CMG2  I  domains  lie  at  the  base  of  the  heptameric  ‘cap’, 
increasing  its  height  by  35  A.  The  1  domains  are  well  separated, 
consistent  with  a  7:7  binding  stoichiometry'",  and  their  amino-  and 
carboxy  termini  point  downwards,  towards  the  membrane.  In  the 
transition  from  pre-pore  to  pore,  the  seven  hairpin  loops,  one  from 
each  PA  monomer'’  *,  are  predicted  to  create  a  14-stranded,  mem¬ 
brane-spanning  3-barrel.  Assuming  an  a-haemolysin-like  struc¬ 
ture''*,  the  barrel  extends  ~75A  below  the  I  domains,  with  the 
bottom  30  A  spanning  the  membrane.  This  leaves  ~40A  between 
the  bottom  of  the  I  domains  and  the  membrane  surface,  which  may 
be  occupied  by  the  .second  domain  of  CMG2,  which  comprises 
—  100  re.sidues  between  the  I  domain  and  its  C-terminal  transmem¬ 
brane  sequence.  Thus,  the  receptor  may  support  the  heptamer  at  the 
correct  height  above  the  membrane  for  accurate  membrane  inser¬ 
tion,  which  is  .stoichiometric  on  cells  but  less  efficient  in  the  absence 
of  receptor'*. 

Soluble  versions  of  the  CMG2  and  TEM8  I  domains  protect 


TaWo  1  Data  collection  and  refinement  statistics 
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1 1.5  (2.4) 

<T-CUtOff 
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r.m.s.d.  bend  lengths  (k 
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1.65 

Ramachandran  plot  (residkic 

?s,  %) 

Most  favoured 

655 

86.3% 

Addihonally  alla.vcd 
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0.4% 

Oisallcwed 
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Vaiuos  m  parentheses  refer  to  the  higtiest  resolution  shelf  (2..‘39-2.50  A), 
•■n-is  three  ‘.'alucs  are  for  Wilson,  rnain  chain  and  side  chain,  respcetivolv. 


against  anthrax  (liadUus  nnthracis)  toxin  byacting  as  decoys’’",  and 
our  structure  will  allow  for  the  design  of  new  therapeutic  agents  that 
disrupt  the  P,Vreceptor  interaction.  TEM8  is  strongly  upregulated 
on  the  surface  of  endothelial  cells  that  line  the  blood  vessels  of 
tumours’'’’’',  allowing  for  the  development  of  anthra,x  toxin  as  an 
anti-tumoiir  agent”;  however,  toxicity  may  arise  as  CMG2  is 
expres.sed  in  n’lost  tissues.  Although  we  expect  the  interaction,?  of 
TEM8  and  CMG2  with  PA  to  be  very  similar,  there  are  significant 
differences  that  maybe  exploited  in  the  design  of  PA  molecule,?  that 
would  bind  better  to  TEM8  than  to  CMG2,  thu?  minimizing  the 
.side  effects  from  toxin  binding  to  normal  ti.ssues.  For  example,  VI 1 5 
of  CMG2,  which  lies  at  the  heart  of  the  interface  with  PA  domain  IV, 
is  a  glycine  in  TEM8,  whereas  the  rim  of  the  pocket  that  accepts  the 
PA  domain  II  loop  has  the  sequence  DGL  in  CMG2  but  is  replaced 
by  the  sequence  HED  in  TEM8.  □ 

Mettiods 

Protein  expression  and  purification 

Full-length  FA  (residues  1-735)  was  prepared  as  previously  described'''.  I'he  1  domain  of 
human  CMG2  was  cloned  as  an  N-lerniinal  His-lag  fusion  in  pKT15b  (Novagen)  and 
expressed  in  Escherichia  coli stnxin  B12l(l)l:3).  After  induction  ot  cell  cultures  witli 
0.5  niM  IPTG  for  2  h  at  37X\  CMG2  was  purified  from  the  soluble  fraction  of  the  cell 
lysate  by  nickel  affinity  chromatography  (HiTrap  chelating  HP,  Pharmacia),  followed  by 
removal  of  the  tag  with  thrombin  (Sigma),  ion  exchange  (HiTrap  monoQ.  Pharmacia)  and 
gel  filtration  (Superdex  S75,  Pharmacia),  affinity  removal  of  thrombin  (HiTrap 
benzamidine  FF,  Pharmacia)  and  incubation  in  a  buffer  containing  100  mM  FDTA  to 
strip-bound  metal.  The  final  product  vs'as  dialysed  and  concentrated  to  1 5-20  mg  ml”  ’ 
and  flash-frozen  in  150  mM  NaCI,  20  mM  TrisCi  pH  7.5,  atid  comprises  ie.sidues  40-218  of 
(GenBank  accession  number  AAK77222)  plus  an  K-toniiinal  extension  of 
sequence  GSHMl.F.l^PRG  as  a  result  of  the  cloning  strategy.  The  mniecular  ma.ss  was 
confirmed  by  matrix-assisted  laser  de.snrption./ionization  time-of-flight  mass 
spectrometry.  To  prepare  the  PA-CMG2  complex.  PA  was  mixed  at  a  final  concentration  of 
4  mg  ml” '  with  a  threefold  molar  excess  of  CMG2  and  a  twofold  excess  of  MnCl>> 
incubated  for  20  min  at  room  temperature  and  purified  by  gel  filtration  (Superdex  S200, 
Pharmacia).  The  complex  w-as  extensively  dialysed  and  exchanged,  and  concentrated  to 
6mgml”’  in  20mM  TrisCi  pH  7.5,  lOpM  MnCK  for  crystallization  trials. 

Crystallization  and  structure  solution 

Nccdlc-likc  crystals  grew  to  a  size  of  10  X  10  X  500  p.m  in  5-10  days  at  room  temperature 
in  a  sitting-drop  vapour  difl'usion  set-up  using  a  reservoir  buffer  containing  50-100  mM 
CUES  pH  9.0-9.2, 25%  PEG400.  Crystals  were  flash-frozen  at  4  in  liquid  nitrogen  using 
the  crystalliz.ation  buffer  with  40%  PEG400  as  a  cryo-prolecianl  before  diffraction 
analysis.  'J’hc  crystals  belong  to  space  group  /<2,2t2|  with  unit  cell  parameters  a  =  88.2  A, 
b  =  94.2  A,  c  —  135.6  A.  There  is  one  FA-CMG2  complex  in  the  asymmetric  unit.  ;\ 
complete  native  data  set  to  2.5  A  was  collected  at  bcamline  9- 1  at  SSKL  on  a  zADSC 
Quantum-315  CCD  detector  and  processed  with  the  I IKL  package^'  (see  Table  1 ).  FA  was 
positioned  in  the  unit  cell  by  Molecular  Replacement  (Protein  Data  Bank  (PDB)  11)  code 
lacc)'’  using  MOLREP,  and  refined  with  REFMAC  version  5,0  fref.  24).  Density  for  the 
MIDAS  Mn""^  ion  and  upper  loops  of  the  receptor  was  evident  in  this  map,  and  a  molecule 
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of  (3M(j2  ()M)B  1I>  code  ISi  I'i'l"  was  manually  placed  in  ihc  electron  densitv.  Model 
building  was  performed  with  CV'  and  TURHOl'I’OIIQ  (A.  Roussel  and  C  Cainbillau, 
Silicon  (Graphics),  and  the  solvent  structure  was  built  with  ARR/wARR  6.0  (ref.  26). 
Although  the  l  andrnn  eri  ors  in  thedifiVaction  data  are  Itigh,  owing  to  the  small  crystal  .si/c, 
the  final  rcfincrnenl  statistics  and  maps  arc  excellent  ('lable  1  t.TIuis,  the  final  R-factorsarc 
Rj,,.,,  ”  26.6'.%)  and  R.,,.„rk  “  20,7*%)  overall,  and  Ri„,.  ~  37.2%  and  R.,v„ik  —  2?.3‘?'i,  in  the 
snitei'  resolution  bin,  with  root -mean-square  deviations  (r.ni.s.d.)  from  ideal  values  of 
0.0  i  7  A  for  bond  lengths  and  I  foi’  angles.  Stereochemistry  is  excellent  as  assessed  witli 
l‘ROrd  II'.CK’',  and  the  model  is  consistent  with  composite  simulated  annealing  omit 
maps  (.%,000 '"(i)  calculated  in  CMS''.  The  model  compiiscs  residues  l(v-735 of  l*A; '11-210 
of  CMG2,  with  the  exception  of  three  loops  (residues  l.%9  - 17*1,  276 -287  and  30-1-3 19)  in 
I'A  for  which  no  electron  density  was  observed;  139  water  molecules:  two  Ca"  '  ions  in  PA 
dtimain  I:  two  Na^  ions:  one  I^lTi  molecule;  and  one  Mn'*  ion  at  the  Mll  VAS  site.  The  R 
factois  fo!'  the  (2a"  ‘  and  Mn"  ’  ions  (27-33  A")  arc  liighcr  titan  for  the  coordiitating 
residues  ( 16  -20  A").  Although  the  MIDAS  metal  ion  in  v/w  is  likely  to  be  Mg"  ' ,  we  have 
previously  shown  for  integrin  I  domains  that  the  stereochemistry  of  the  open 
conformation  is  not  dependent  on  the  nature  of  the  metal  ion\  'I'he  bond  lengths  to  the 
Mu'  *  i(tn  are  2.1  ±  0.2  .A,  identical  to  those  observed  in  integrin- ligand  coittplmV’” 

PA  domain  1  (residues  16-2,S8)  undergoes  a  small  rotation  as  a  consequence  of  crystal 
C(tnstraints  when  compared  with  the  structure  of  isolated  PA  such  that  the  r.m.s.d.  values 
Ibr  the  superposition  of  the  two  molecules  arc  1.41,  0.58  and  0.79  .-3  for  residues  16  -735, 
259-735  and  16-258  rcspecli\'eiy.  CMG2  residues  11-200  superimpose  with  a  r.m.s.d.  of 
0.60  with  the  isolated  protein",  while  tltc  C-terminal  helix  (rwiducs  201-210)  shifts 
downwards  by  one  helical  turn. 
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The  bipolar  mitotic  spindle  is  responsible  for  segregating  sister 
chromatids  at  anaphase.  Microtubule  motor  proteins  generate 
spindle  bipolarity  and  enable  the  spindle  to  perform  mechanical 
work'.  A  major  change  in  spindle  architecture  occurs  at  anaphase 
onset  when  central  spindle  assembly  begins.  This  structure 
regulates  the  initiation  of  cytokinesis  and  is  essential  for  its 
completion^.  Central  spindle  assembly  requires  the  centralspin- 
dlin  complex  composed  of  the  CaenorhalnUtis  degans  ZEN-4 
(mammalian  orthologue  MKLPl)  kincsin-like  protein  and  the 
Rho  family  GAP  CYK-4  (MgcRacGAP).  Here  wfe  describe  a 
regulatory  mechanism  that  controls  the  timing  of  central  spindle 
assembly.  The  mitotic  kinase  Cdkl/cyclin  B  phosphorylates  the 
motor  domain  of  ZEN-4  on  a  conserved  site  within  a  basic  amino- 
terminal  extension  characteristic  of  the  MKLPl  subfamily.  Phos¬ 
phorylation  by  Cdkl  diminishes  the  motor  activity  of  ZEN-4  by 
reducing  its  affinity  for  microtubules.  Preventing  Cdkl  phos¬ 
phorylation  of  ZEN-4/MKLP1  causes  enhanced  nietaphase  spin¬ 
dle  localization  and  defects  in  chromosome  segregation.  Thus, 
phosphoregulation  of  the  motor  domain  of  MKLPl  kinesin 
ensures  that  central  spindle  assembly  occurs  at  the  appropriate 
time  in  the  cell  cycle  and  maintains  genomic  stability. 

At  the  metaphase-anaphase  transition,  the  anaphase-promoting 
complex  triggers  proteolysis  ofcyclin  B  (an  activating  subunit  of  the 
mitotic  kinase  Cdkl)  and  sister  chromatid  separation.  Chromo¬ 
somes  move  polewards  and  non-kinctochore  spindle  microtubules 
become  bundled,  initiating  assembly  of  the  central  spindle,  a 
structure  that  has  important  roles  in  cytokinesis.  In  C.  elegans 
embryos  and  other  animal  cells,  central  spindle  assembly  requires 
centralspindliiv’.  Many  proteins  that  regulate  mitosis  and  cytokin¬ 
esis  re-localize  upon  anaphase  onset.  For  example,  Aurora  B  and  its 
associated  subunits  dissociate  from  centromeres  and  concentrate  on 
the  centra!  spindle'”*.  Similarly,  anaphase  onset  triggers  redistribu¬ 
tion  of  centralspindlin  (Fig.  la,  b).  In  metaphase,  centralspindlin  is 
diffuse  and  in  anaphase  it  localizes  to  the  microtubules  positioned 
betw'een  the  separating  chromosomes,  as  seen  previously'”'”.  ZEN-4 
(also  known  as  CeMKLPl)  colocalizes  with  the  proline-directed 
phosphatase  CDC-14  (ref.  11)  and  depletion  of  CL)C-14  prevents 
ZEN -4  localization'".  Non-degrad.able  cyclins  stabilize  Cdkl  activity 
and  prevent  central  spindle  assembly'”’".  Together  these  data 
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We  report  a  structural  and  functional  analysis  of 
the  X  prophage  Ba02  endolysin  (PlyL)  encoded  by 
the  Bacillus  anthracis  genome.  We  show  that 
PlyL  comprises  two  autonomously  folded 
domains,  an  N-terminal  catalytic  domain  and  a 
C-terminal  cell  wall-binding  domain  (CBD).  We 
determined  the  crystal  structure  of  the  catalytic 
domain;  its  three-dimensional  fold  is  related  to 
that  of  the  cell  wall  amidase,  T7  lysozyme,  and 
contains  a  conserved  Zn  coordination  site  and 
other  components  of  the  catalytic  machinery.  We 
demonstrate  that  PlyL  is  an  N-acetylmuramoyl- 
L-alanine  amidase  that  cleaves  the  cell  wall  of 
several  Bacillus  species  when  applied 
exogenously.  We  show,  unexpectedly,  that  the 
catalytic  domain  of  PlyL  cleaves  more  efficiently 
than  the  full-length  protein,  except  in  the  case  of 
B.  cereus;  and  using  GFP-tagged  CBD,  we 
detected  strong  binding  of  the  cell  wall-binding 
domain  to  B.  cereus  but  not  to  other  species 
tested.  To  explain  these  data,  and  the  species 
specificity  of  PlyL,  we  propose  that  the  C- 
terminal  domain  inhibits  the  activity  of  the 
catalytic  domain  through  intramolecular 
interactions  that  are  relieved  upon  binding  of  the 
C-terminal  domain  to  the  cell  wall.  Furthermore, 
our  data  show  that  (when  applied  exogenously) 
targeting  of  the  enzyme  to  the  cell  wall  is  not  a 
prerequisite  of  its  lytic  activity,  which  is 
inherently  high.  Thus,  the  catalytic  domain  of 
PlyL  might  be  developed  as  a  therapeutic  agent 
with  broad  efficacy  against  Gram  positive 
bacteria. 

Endolysins  are  bacteriophage-encoded  enzymes  that 
lyse  the  host  bacterial  cell  wall  during  the  lytic  phase 
of  the  phage  infectious  cycle.  They  typically  consist 


of  an  N-terminal  catalytic  domain  and  a  C- 
temiinal  domain  that  targets  the  enzyme  to  the  cell 
wall,  providing  high  species  and  strain  specificity 
(1,2).  For  example,  the  Listeria  monocytogenes 
lysins,  Ply]  18  and  Ply500,  specifically  hydrolyse 
Listeria  cells,  but  are  inactive  in  the  absence  of  the 
cell  wall-binding  domain  (1). 

A  comparative  genome  analysis  of  Bacillus 
anthracis  revealed  a  gene  encoding  a  putative 
endolysin  within  the  integrated  copy  of  the  X  Ba02 
prophage,  which  we  will  call  PlyL.  PlyL  has  a 
high  degree  of  sequence  similarity  in  its  catalytic 
domain  with  an  endolysin  from  the  bacteriophage 
Y  (PlyG)  (3,4),  which  specifically  lyses  and  kills  B. 
anthracis  and  closely  related  species  when  added 
exogenously  to  bacterial  cultures.  For  this  reason, 
PlyG  is  being  developed  as  a  diagnostic  and 
therapeutic  agent  (5). 

Here  we  describe  a  structural  and  functional 
analysis  of  PlyL.  We  show  that  the  N-terminal 
(catalytic)  domain  is  an  amidase  with  high 
inherent  lytic  activity  against  the  cell  wall  of 
several  Bacillus  species.  In  contrast  to  many 
previously  described  enzymes,  we  have  found  that 
the  presence  of  the  C-terminal  domain  either 
reduces  or  has  no  effect  on  the  lytic  activity.  This 
unexpected  finding  suggests  that  the  catalytic 
domain  of  PlyL  could  be  developed  as  a  relatively 
broad-spectrum  antibacterial  agent. 

MATERIALS  AND  METHODS 

Cloning  and  expression  of  full-length  endolysin 
and  C-terminal  domain  -  Full  length  PlyL  was 
cloned  by  PCR  from  the  Bacillus  anthracis  Ames 


1 


strain  total  DNA  extract  prepared  by  Dr  Phil  Hanna 
(University  of  Michigan  Medical  School)  using  the 
oligonucleotide  primers  5’- 
AAAGGAGATATACATATGGAAATCAGAAAA 
AAATTAGTT-3’  (forward)  and  5’- 
GAATTCGGATCCTCATTATTTATCATCATAC 
CACCAATC-3’  (reverse).  We  used  the  forward 
primer  5  ’  - 

GGAGATATACATATGGCAAGTGCAACGGTA 
ACCCCTAAA-3’  with  the  same  reverse  primer. 
PGR  products  were  cloned  into  pET22b  (Novagen) 
via  Ndel  and  BamWl  restriction  sites  (without  tag). 
The  resulting  plasmids  were  transformed  into 
BL21DE3  (Novagen)  for  protein  expression.  Full- 
length  and  C-terminal  domain  proteins  were 
expressed  using  the  same  protocol.  Transformed 
cells  from  overnight  plates  were  used  to  inoculate  1 
L  of  2xTY  medium  (16  g/L  Tryptone,  10  g/L  yeast 
extract,  and  5  g/  NaCl;  with  100  pg/ml  ampicillin), 
and  allowed  to  grow  to  OD^oo  of  1.0  at  37°C.  1  mM 
IPTG  was  added  to  induce  protein  expression  over 
three  hours  at  37°C. 

Full  length  PlyL  purification  -  Cells  were 
harvested  by  centrifugation  at  4°C.  30  ml  of  lysis 
buffer  (50  mM  Na-Mes(morpholinoethanesulfonic 
acid)  pH  6.0,  10  mM  (3-mercaptoethanol,  0.1  % 
Triton  X-100,  and  0.1  mM  ZnS04)  was  used  to 
resuspend  the  cell  pellet.  Resuspended  cells  were 
lysed  by  sonication  and  clarified  by  centrifugation 
for  1  hour  at  4°C.  Clarified  lysate  was  loaded 
directly  into  a  HITRAP  5  ml  SP  column  on  an  Akta 
FPLC  (Amersham  Biosciences)  equilibrated  with  50 
ml  buffer  A  (50  mM  Na-Mes  pH  6.0,  10  mM  p- 
mercaptoethanol,  and  0.1  mM  ZnS04).  Unbound 
protein  was  eluted  by  washing  the  column  with  50 
ml  buffer  A.  A  gradient  of  0-1  M  NaCl  in  buffer  A 
with  a  total  volume  of  50  ml  was  applied  to  the 
column  to  elute  the  protein.  Fractions  containing  the 
flill  length  PlyL,  more  than  90%  pure  as  verified  by 
SDS-PAGE,  were  pooled  and  concentrated  to  10-20 
mg/ml. 

Purification  and  crystallization  of  the  catalytic 
domain.  The  N-terminal  catalytic  domain  was 
generated  by  limited  proteolysis  of  the  full-length 
PlyL  using  elastase  at  a  ratio  of  1:100  at  room 
temperature  for  16  hours.  A  Superdex  S75  16/60 
column  (Amersham  Biosciences)  was  used  as  a  final 
column  to  purify  the  catalytic  domain.  The  buffer 
was  20  mM  Tris-Cl  (pH  7.0),  100  mM  NaCl,  10  mM 


P-mercaptoethanol.  The  final  purified  protein  was 
concentrated  to  20  mg/ml.  Mass  spectrometry  and 
amino  acid  analysis  revealed  that  elastase  cleaved 
after  residue  Vall59.  The  protein  appeared  as  a 
single  band  on  SDS-PAGE,  and  the  molecular 
weight  was  confirmed  by  MALDI-MS.  Crystals 
were  obtained  by  hanging-drop  vapor-diffusion  at 
20°C,  using  a  reservoir  of  0.6  M  NaH2P04,  1 .0  M 
K2HPO4,  0.1  M  acetate  at  pH  6.7.  Each  drop 
consisted  of  2  pi  protein  and  1  pi  buffer.  Crystals 
grew  as  hexagonal  rods  to  0.1  mm  x  0.1  mm  x  0.3 
mm  in  three  days  at  room  temperature.  They  adopt 
space  group  P6|  with  cell  dimensions  a=b=163.2 
A,  c=37.3  A.  To  prepare  for  cryo-X-ray  data 
collection,  the  ciystals  were  soaked  in  a  series  of 
steps  with  crystallization  buffer  containing  10% 
glycerol  followed  by  buffer  supplemented  with 
20%  glycerol.  All  X-ray  data  sets  were  collected  at 
100  K. 

C-terminal  domain  purification  and 
crystallization  -  Bacterial  cell  extracts  were 
prepared  as  described  above.  Supernatant  was 
loaded  onto  the  equilibrated  Ni-NTA  column,  and 
washed  with  10  column-volumes  of  wash  buffer. 
The  elution  buffer  was  similar  to  the  wash  buffer, 
but  included  300  mM  imidazole.  The  protein 
elution  was  directly  linked  to  an  equilibrated  gel 
filtration  column  (20  mM  Tris-Cl  and  100  mM 
NaCl  at  pH  7.0).  The  His-tagged  C-terminal 
domain  eluted  as  a  monomer.  Crystals  of  the  C- 
terminal  75  amino  acid  domain  were  obtained  by 
equilibration  against  1.5  M  (NH4)2S04  and  10% 
glycerol  in  TrisCl  pH  7.0  by  hanging-drop  vapor 
diffusion.  The  crystal  grew  to  a  size  of  0.1  x  0.1  x 
0.3  mm^  in  seven  days  at  room  temperature;  they 
diffract  to  2.7  A  resolution  using  a  Rigaku  FR-E 
High  Brilliance  X-Ray  generator  and  adopt  space 
group  P4|2i2  with  cell  dimensions  a=b=52.5  A, 
c=224.2  A. 

Structure  Determination  of  the  catalytic 
domain.  MAD  data  sets  were  collected  at  beam¬ 
line  9-2  at  the  Stanford  Synchrotron  Radiation 
Laboratory  using  a  MAR345  image  plate,  and 
processed  using  the  programs  DENZO  and 
SCALEPACK  (6).  The  presence  of  a  zinc  ion  in 
the  crystal  was  confirm  by  a  fluorescence  scan  at 
the  Zn  L-I  edge.  18  selenomethionine  sites  were 
found  using  SOLVE  (7)  and  used  for  phase 
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calculation  to  a  resolution  of  2.0  A.  An  initial  model 
was  generated  by  RESOLVE  (8),  further  model 
building  was  done  using  O  (9)  and  the  model  refined 
with  CNS  (10)  (version  1.1  on  Mac  OS  X).  Native 
crystals  were  obtained  under  identical  conditions.  A 
data  set  was  collected  in-house  with  a  Rigaku  FR-E 
High  Brilliance  X-Ray  generator  using  the  R-axis  IV 
detector.  The  CNS-refined  model  of  the 
selenomethionine  structure  was  used  as  the  input 
template  for  native  refinement  to  a  resolution  of  1.86 
A.  There  are  three  molecules  (A-C)  in  the 
asymmetric  unit,  and  a  solvent  content  of  46%. 
Density  for  the  last  two  amino  acids  in  the  molecules 
A  and  C  are  missing.  Molecule  B  has  the  most 
complete  density  throughout,  and  its  B-factors  are 
lower  than  for  the  other  two  molecules.  Refinement 
statistics  are  presented  in  Table  1.  The  coordinates 
and  structure  factors  have  been  deposited  with  the 
PDB  with  accession  code  lYBO 

Assay  of  lytic  activity.  The  activity  of  PlyL  when 
applied  exogenously  to  cultures  of  B.  anthracis 
(Sterne  34F2),  B.  cereiis  ATCC  4342,  B. 
megaterium  WH320,  B.  siihtilis  168  and  Escherichia 
coli  CFT073  was  tested.  Cultures  were  grown  to 
mid-exponential  phase,  and  cells  were  harvested  and 
resuspended  in  10  mM  sodium  phosphate  (pH  7.0). 
The  lysis  of  cell  suspensions  upon  addition  of  2  to  4 
pM  pure  endolysin  samples  was  monitored  at  600 
nm. 

Determination  of  the  cleavage  site  in 
peptidoglycan.  Peptidoglycan  suspension  (0.5 
mg/ml)  from  B.  siibtilis  (Fluka)  was  incubated  at 
37°C  with  purified  PlyL  (0.4  pM)  in  10  ml  of 
Good’s  buffer  (20  mM  Na-MES,  pH  6.5)  containing 
100  mM  KCl.  Boiled  PlyL  was  used  as  a  control. 
After  incubation  for  30,  60,  and  120  min,  samples 
were  boiled  and  centrifuged  at  13000  rpm/min,  clear 
supernatants  were  analyzed  for  the  release  of  free 
amino  acids  using  a  modified  protocol  described  in 
[12].  100  pi  aliquots  were  mixed  with  12  pi  of  10% 
K2B4O7  and  10  pi  of  l-fluoro-2, 4-dinitrobenzene 
solution  (0.1  M  in  ethanol)  was  added,  and  the 
mixture  was  heated  at  65°C  for  45  min  in  the  dark. 
Following  acid  hydrolysis  in  4  M  HCl  for  12  h  at 
95°C,  the  dinitrophenyl  (DNP)-labeled  compounds 
were  analyzed  by  HPLC  on  a  reverse-phase  column 
(C|g,  4.6x  150  mm,  Vydac).  The  labeled  amino  acids 
were  eluted  with  a  linear  gradient  from  90%  A  + 


10%  B  to  30%  A  +  70%  B  (A:  10%  acetonitrile  in 
20  mM  acetic  acid;  B:  90%  acetonitrile  in  20  mM 
acetic  acid),  and  detected  at  365  nm.  The  release 
of  free  reducing  groups  during  the  enzymatic 
reaction  was  measured  by  a  modified  Morgan- 
Elson  reaction  (12)  using  V-acetylglucosamine  as 
the  standard. 

C-terminal  domain  cell  binding  assay.  A 

modified  Green  Fluorescent  Protein  (GFP)  gene 
(gift  of  Dr  Ruchika  Gupta)  was  PCR-amplified 
using  the  following  oligonucleotide  primers:  5’- 
CGCGGCAGCCATATGGTGAGCAAGGGCGA 
GG  AGCTGTTC-3  ’  and  5’- 
GCCCGGATCCTCGAGTTACTTGTACAGCTC 
GTCCATGCC-3’.  The  resulting  fragment  was 
digested  by  Nde\  and  Xhol  (underlined)  and 
ligated  with  the  XhoI-BamHI  fragment  of  the  C- 
terminal  domain  of  PlyL  which  was  amplified 
using  5  ’  - 

AGCCATATGCTCGAGATGGCAAGTGCAAC 
GGTAACCCCT-3’  (forward)  and  the  same 
reverse  oligonucleotide  that  was  used  for  the 
cloning  of  the  full  length  protein.  The  GFP-C- 
tenninal  domain  fusion  and  a  GFP  control  were 
cloned  into  a  pET15b  vector  via  Ndel  and  BamHI 
or  Xhol,  respectively.  Both  proteins  were 
expressed  and  purified  using  Ni-NTA  affinity 
chromatography  and  gel-filtration  as  described 
above.  Cell  samples  for  the  binding  assays  were 
obtained  by  growing  Bacilli  cultures  to  late  log 
phase.  Cells  were  harvested,  washed  with  PBS-T 
(PBS  +  0.1%  Tween-20),  and  incubated  with  0.4 
mM  protein  samples  (GFP-C-domain  fusion  or 
GFP  control)  for  5  min  at  room  temperature,  prior 
to"  three  washes  with  PBS-T.  The  washed  cells 
were  smeared  onto  a  microscope  slide  for  confocal 
image  analysis  with  the  Biorad  Radiance  2100 
Multiphoton  Laser  Scanning  Confocal  Microscope 
system  equipped  with  Argon  laser  (Image 
Analysis  and  Histology  Facilities,  The  Burnham 
Institute).  The  objective  used  was  60X  LSM  with 
oil  immersion,  and  zoom  5  on  the  N.A.1.0 
(Olympus)  microscope.  The  wavelength  of  488 
nm  was  used  to  excite  the  GFP. 

RESULTS 

Identification  and  characterization  of  PlyL  -  A 

Blast  search  (http://www.ncbi.nlm.nih.gov/BLAST/) 
using  the  y  phage  endolysin,  PlyG,  as  the  query 
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sequence  identified  two  genes  encoding  putative 
endolysins  located  within  an  integrated  prophage  of 
B.  anthracis.  The  X  BaOl  and  X  Ba02  endolysins  are 
annotated  as  BA3767  and  BA4073  (“PlyL”), 
respectively,  in  the  genome  sequence  of  B.  anthracis 
Ames  (NCBI  accession  number  NC_003997). 
Additional  endolysins  from  other  Bacillus  species 
and  their  phages  were  also  detected  in  this  search. 
Those  with  greater  than  30%  identity  over  their 
catalytic  domains  are  shown  in  Figure  1.  PlyL  is 
most  closely  related  to  PlyG  in  both  the  enzymatic 
(93%  identity)  and  C-terminal  (60%  identity) 
domains.  BA3767  is  also  very  similar  but  lacks  the 
C-tenninal  domain. 

We  cloned  and  expressed  a  B.  anthracis  gene 
encoding  BA4073/PlyL.  Crystallization  trials  of  the 
full-length  protein  were  unsuccessful.  However, 
limited  proteolysis  using  elastase  allowed  us  to 
isolate  a  stable  N-terminal  fragment  (residues  1- 
159).  Cleavage  occurs  at  the  junction  between  the 
predicted  catalytic  and  cell-wall  binding  domains. 
This  fragment  was  much  more  soluble  than  the  fiill- 
length  protein  (>  40  mg/ml  versus  <  3  mg/ml)),  and 
crystallized  readily.  We  also  crystallized  the  C- 
tenninal  domain;  although  we  have  not  yet  solved  its 
structure,  the  existence  of  crystals  that  diffract  to 
high  resolution  indicates  that  it  is  an  autonomously 
folded  domain. 

7V-acetyImuramoyl-L-alanine  amidase  activity  of 
PlyL  resides  in  its  N-terminal  domain  -  To  assess 
the  enzymatic  activity  of  PlyL,  peptidoglycan  from 
B.  subtilis  was  treated  with  full-length  PlyL  and  the 
elastase-generated  N-terminal  fragment.  No  increase 
in  free  reducing  groups  derived  from  peptidoglycan 
could  be  observed,  indicating  that  the  enzyme  is 
neither  a  glucosaminidase  nor  a  muramidase.  The 
free  amino  groups  of  the  digested  (solubilized) 
products  were  labeled  with  l-fluoro-2,4- 
dinitrobenzene.  After  acid  hydrolysis,  the  DNP- 
labeled  compounds  were  separated  by  HPLC.  Only 
the  amount  of  DNP-alanine  was  increased 
significantly  (Supplementary  Fig.  lA),  which 
indicates  that  the  enzyme  is  an  77-acetylmuramoyl-L- 
alanine  amidase,  specifically  cleaving  the  amide 
bond  between  N-acetylmuramic  acid  and  L-alanine. 
The  same  result  was  observed  for  the  N-terminal 
proteolytic  fragment,  showing  that  it  comprises  a 
complete  catalytic  domain.  The  N-terminal  domain 
was  more  active  than  the  full-length  protein  in  this 


assay  (Supplementary  Fig.  IB),  providing  the  first 
indication  that  the  C-terminal  domain  is 
autoinhibitory. 

Structure  of  the  PlyL  N-terminal  domain  -  We 

solved  the  structure  of  the  PlyL  catalytic  domain 
(residues  1-159)  at  1.86  A  resolution  using  MAD 
phasing  from  a  selenomethionine-substituted 
protein  (Table  I).  The  fold  is  most  similar  to  those 
of  the  T7  lysozyme  (13),  Citrobacter  AmpD  (14) 
and  the  Drosophila  peptidoglycan  recognition 
protein  PGRP-LB  (15),  with  which  it  shares  10- 
20%  identity.  For  consistency,  we  have  followed 
the  strand  and  helix  nomenclature  of  T7  lysozyme. 
The  overall  fold  consists  of  a  6-stranded  (3-sheet 
flanked  by  four  long  a-helices  (one  at  the  front 
(al)  and  three  at  the  back  (a2  a3  and  a4)  as  well 
as  a  number  of  elaborate  loops  with  short  a-helical 
segments  (Fig.  2,  3A).  Compared  with  T7 
lysozyme,  an  N-terminal  extension  creates  an 
additional  (3-strand  ((30)  at  one  end  of  the  sheet.  A 
zinc  ion  binds  to  the  front  face  of  the  molecule  at 
the  center  of  the  active  site,  coordinated  by  His29 
from  strand  (31,  and  by  two  residues.  His  129  and 
Cysl37,  on  either  side  of  strand  p5.  The  fourth 
ligand  is  a  phosphate  (or  sulfate)  ion  from  the 
crystallization  buffer. 

Enzyme  Active-site  -  The  active  site  is  solvent- 
exposed  and  lies  in  a  shallow  groove  on  the 
protein  surface,  consistent  with  the  ability  to 
cleave  a  highly  cross-linked  and  branched 
polymer.  Helix  a  1  packs  more  closely  against  the 
P-sheet  in  PlyL  than  in  T7  lysozyme,  so  that  the 
pronounced  substrate-binding  binding  groove 
observed  for  T7  lysozyme  is  not  seen  for  PlyL. 
The  active  site  can  be  overlaid  closely  with  that  of 
T7  lysozyme  (Fig.  3B).  The  three  zinc¬ 
coordinating  residues  (His29,  His  129  and  Cysl37) 
are  conserved  between  PlyL  and  T7  lysozyme  (the 
third  zinc-coordinating  residue  is  an  Asp  in 
Citrobacter  AmpD).  PlyL  Lysl35  is  structurally 
analogous  to  Lysl28  of  T7  lysozyme,  which  has 
been  shown  to  be  important  for  catalysis  (13), 
perhaps  by  stabilizing  the  developing  negative 
charge  on  the  amide  carbonyl  in  the  transition 
state;  however,  PGRP-LB  has  a  threonine  at  this 
position.  Tyr46  in  T7  lysozyme  and  Tyr78  in 
PGRP-LB  are  important  for  catalysis,  and  are 
thought  to  act  as  the  general  base  to  activate  the 
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nucleophilic  water  molecule.  On  the  basis  of 
sequence  alignment  the  analogous  residue  in  PlyL 
was  predicted  to  be  Phe53.  However,  in  the  crystal 
structure  the  side  chain  of  Phe53  adopts  a  different 
orientation  and  the  carboxylate  group  of  Glu90 
(from  a  neighboring  strand)  occupies  the  space 
analogous  to  the  T7  tyrosine.  To  demonstrate  a 
catalytic  role  for  Glu90  in  PlyL,  we  mutated  it  to 
alanine,  and  indeed  this  mutation  completely 
abolished  the  amidase  activity  (data  not  shown). 

There  are  only  ten  amino  acid  residues  different 
within  the  N-terminal  domains  of  PlyL  and  PlyG,  so 
that  their  3D  stractures  should  be  almost  identical. 
These  differences  are  plotted  on  the  three- 
dimensional  model  of  PlyL  (Fig.  3A).  Most  of  the 
differences  are  located  on  the  surface  of  the 
molecule,  and  all  of  them  are  distant  from  the  active 
site  and  a  putative  substrate  binding  cleft,  suggesting 
that  the  two  catalytic  domains  should  have  similar  or 
identical  substrate  specificity  and  catalytic  activity. 

Lytic  activity  of  PiyL.  -  We  next  examined  the  lytic 
activity  of  PlyL  on  whole  cells  of  several  bacilli,  as 
measured  by  light  scattering  (OD600)  (Fig-  4)  and 
confirmed  by  microscopy.  We  found  that  the  full 
length  PlyL  lysed  B.  cereus  with  an  efficiency 
comparable  to  that  reported  for  PlyG  on  B.  anthracis 
and  some  strains  of  B.  cereus  (5).  However,  in 
marked  contrast  with  PlyG,  a  relatively  high  lytic 
activity  of  PlyL  was  established  on  B.  megaterium 
and  lower  but  detectable  activity  on  B.  subtilis  and 
B.  anthracis. 

We  found,  unexpectedly,  that  the  N-terminal 
catalytic  domain  of  PlyL  is  more  active  than  the  full- 
length  protein  in  lysing  B.  subtilis,  B.  megaterium 
and  B.  anthracis  cells.  The  strongest  enhancement 
was  observed  on  B.  subtilis  (Fig.  4B,  C).  By 
contrast,  the  removal  of  the  C-terminal  domain  had 
almost  no  effect  on  the  lytic  activity  towards  B. 
cereus. 

To  further  assess  the  role  of  the  C-terminal  domain 
of  PlyL,  we  performed  cell-binding  studies  using  a 
recombinant  C-terminal  domain  fused  with  GFP. 
When  added  to  B.  cereus  and  viewed  under  a 
confocal  microscope,  a  clear  green  fluorescence  can 
be  observed  around  the  cells  (Fig  4D).  No  binding 
was  observed  with  B.  megaterium  or  B.  subtilis  {B. 
anthracis  was  not  tested). 


DISCUSSION 

We  have  shown  that  the  endolysin  from  the  B. 
anthracis  X  prophage  Ba02,  PlyL,  is  a  bona  fide 
cell  wall  lytic  amidase  with  a  modular 
organization  comprising  an  N-terminal  catalytic 
domain  and  a  C-terminal  cell  wall-binding 
domain.  We  determined  the  three-dimensional 
atomic  resolution  stnicture  of  the  catalytic  domain 
and  showed  that  the  overall  fold  and  active  site  are 
similar  to  but  distinct  from  that  of  T7  lysozyme 
and  other  amidases.  The  zinc  coordinating 
residues,  His29,  Hisl29  and  Cysl37  are  invariant 
among  the  Bacillus  endolysins  listed  in  Figure  1, 
as  are  the  other  active-site  residues,  Glu90  and 
Lysl35.  The  role  of  Glu90  was  not  predicted  from 
sequence  alignments  with  T7  lysozyme,  but  its 
side  chain  occupies  a  similar  spatial  location  to  the 
general  base  Tyr  in  T7  lysozyme,  and  we 
demonstrated  a  critical  role  for  Glu90  in  catalysis 
by  mutagenesis.  Our  results  suggest  that  all  of  the 
enzymes  listed  in  Figure  1  should  have  an  N- 
acetylmuramoyl-L-alanine  amidase  activity  and  a 
similar  catalytic  mechanism  (as  was  already 
demonstrated  for  the  TP21  endolysin  (1)).  In 
particular,  the  10  residues  that  differ  between  PlyL 
and  PlyG  do  not  lie  close  to  the  active  site,  so  that 
their  distinct  lytic  specificities  are  presumably 
endowed  by  the  C-terminal  domain,  which  is  less 
conserved. 

We  showed  that  the  C-terminal  domain  is  indeed  a 
cell  wall-binding  domain  (CBD)  and  that  it 
interacts  specifically  with  B.  cereus  cells.  We 
further  showed  that  the  presence  of  the  CBD 
within  the  full-length  PlyL  has  an  inhibitory  effect 
on  the  lytic  activity  of  the  catalytic  domain  when 
tested  with  peptidoglycan  or  with  the  whole  cells 
of  B.  subtilis  and,  to  a  lesser  extent,  with  B. 
megaterium  and  B.  anthracis.  By  contrast,  the 
presence  of  the  CBD  had  a  negligible  effect  on  the 
activity  of  PlyL  towards  B.  cereus. 

To  reconcile  these  observations  we  propose  that 
the  C-terminal  domain  of  PlyL  has  a  dual  flmction 
(Fig.  5):  (i)  in  the  absence  of  specific  interaction 
with  cognate  cell  wall,  the  CBD  plays  an 
autoinhibitory  role,  similar  to  a  propeptide  in 
zymogens,  by  binding  to  the  catalytic  domain  and 
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blocking  access  to  the  active  site  either  sterically  or 
allosterically;  and  (ii)  the  CBD  participates  in 
species-specific  cell  wall  binding  (recognition) 
which  disrupts  the  interaction  between  the  CBD  and 
the  catalytic  domain,  thus  relieving  the  inhibitory 
effect.  For  example,  the  marked  difference  in  the 
activity  of  the  full-length  PlyL  and  the  free  N- 
terminal  domain  against  B.  stihtilis  can  be  explained 
by  very  weak  binding  of  the  CBD  to  the  B.  sublilis 
cell  wall,  while  the  cell  wall  is  intrinsically  sensitive 
to  the  amidase  activity.  In  the  case  of  B.  cereus 
where  the  full-length  and  trancated  enzymes  have  an 
almost  equally  high  activity,  we  propose  that  strong 
binding  of  the  CBD  to  the  target  cell  wall  releases 
the  constraints  on  the  catalytic  domain.  It  is 
surprising,  however,  that  localization  of  the 
enzymatic  domain  to  the  cell  surface  does  not 
enhance  the  rate  of  lysis  via  a  local  concentration 
effect. 

Endolysins  are  generally  observed  to  be  highly 
specific  towards  a  particular  species  of  bacteria,  by 
virtue  of  their  distinct  CBDs  that  recognize  variable 
cell  wall  structures  (1,2).  Our  observation  that  the 
catalytic  domain  of  PlyL  has  strong  lytic  activity 
against  a  number  of  different  Bacillus  species  and 
that  this  activity  does  not  require  (or  is  inhibited  by) 
the  CBD  suggests  either  that  the  PlyL/PlyG  family 
of  endolysins  are  atypical  or  that  the  kinetics  of  lysis 
are  different  when  the  lysin  is  applied  exogenously 
rather  than  endogenously.  We  note  however  that 
there  are  precedents  for  such  behavior:  thus,  certain 
phage  hydrolases  have  been  shown  to  maintain  or 
even  increase  their  exogenous  lytic  activity  when  the 
C-tenninus  is  truncated  (16-18).  These  findings  raise 
the  possibility  of  developing  the  catalytic  domains  of 
certain  lysins  as  broad-spectmm  therapeutic  agents. 
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Table  I:  Crystallographic  statistics 


A.  Data  collection 

Peak 

Se-Met 

Remote 

Inflection 

Native 

Wavelength  (A) 

0.9792 

0.8919 

0.9794 

1.54 

Resolution  (A) 

2.03 

2.03 

2.03 

1.86 

Resolution  range 

30-2.03 

30-2.03 

30-2.03 

30-1.86 

(2.07-2.03) 

(2.07-2.03) 

(2.07-2.03) 

(1.89-1.86) 

Total  observations 

190756 

182585 

191392 

188742 

Unique  reflections 

39226 

39277 

39375 

53283 

Completeness 

98.9(96.8) 

98.4(95.5) 

96.9(98.9) 

100(99.9) 

Average  l/o 

19.1(3.0) 

17.0(2.6) 

18.4(2.7) 

23.6(2.2) 

Rsym 

10.8(44.2) 

9.2(45.3) 

9.2(46.5) 

8.7(52.4) 

Figure  of  merit  after  SOLVE  =  0.41 


B.  Refinement 

Native 

Refinement  range 

30.0-1.86 

Number  of  reflections 

48365 

Rwork 

20.8 

Rfree 

24.3 

Number  of  refined  residues 

479 

Number  of  water  molecules 

276 

rmsd  from  ideality 

Bonds  lengths  (A) 

0.007 

Bond  Angles  (deg) 

1.5 

Average  B-value  (A^) 

A 

B 

C 

Protein 

27.7 

25.9 

40.6 

Main-chain 

26.2 

24.3 

39.5 

Side-chain 

29.3 

27.4 

41.7 

Solvent 

34.3 

Ramachandran  Plot  (%) 

Most  favoured 

85.8 

Additionally  allowed 

13.9 

Generously  allowed 

0.2 

Disallowed 

0.0 

Figures  in  parenthesis  refer  to  the  highest  resolution  shell. 

Rsym  =  2|lh-<Ih>|/2Ih,  where  <Ih>  is  the  average  intensity  over  symmetry  equivalent  reflection. 
Rwork  =  2|Fobs-Fcaic|/2Fobs,  Where  the  summation  is  over  the  data  used  for  refinement. 

Rfree  was  Calculated  using  5%  of  data  excluded  from  refinement  (Kleywegt,  1996). 
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Figure  Legend 


Figure  1.  Sequence  alignment  of  a  family  of  Bacillus  endolysins.  Zinc  coordinating  and 
active-site  residues  are  colored  in  cyan  and  red,  respectively.  Arrows  and  cylinders 
represent  the  (3-strands  and  a-helices  secondary  structures  of  the  X  Prophage  Lambda 
Ba02  endolysin  (PlyL).  XlyB,  XlyA,  TP21  and  (j)-105  are  endolysins  from  B. 
licheuiformis,  B.  subtilis,  B.  cereus  and  B.  siibtilis,  respectively.  Alignment  was 
performed  using  the  program  CLUSTALX  version  1.82  (19). 

Figure  2.  Three-dimensional  structure  of  PlyL  and  related  amidases.  Molscript  (version  2.1; 
(20,21))  ribbon  representations  of  the  structures  of  Bacillus  endolysin,  T7  lysozyme 
(PDB:  ILBA) ,  PGRP-LB  (PDB:  lOHT)  and  AmpD  (PDB;  1J3G).  The  zinc  ion  is  shown 
as  grey  sphere..  The  colors  represent  the  secondary  structures  arrangement.  There  are 
three  molecules  in  the  asymmetric  unit,  and  these  are  essentially  identical  in  structure, 
with  backbone  (Ca)  RMS  deviations  of  0.29  A.  The  backbone  RMS  differences  with  T7 
lysozyme  and  PGRP-LB,  are  1.8  A  (for  107  atoms)  and  2.0  A  (for  106  atoms), 
respectively. 

Figure  3.  Stereo  views  of  PlyL  and  active  site  comparisons  (A)  Stereo  Ca  representation  of 
PlyL.  Amino  acids  differences  between  PlyG  and  PlyG  are  indicated.  Most  of  these  are 
surface-exposed  except  for  Val55,  which  makes  hydrophobic  contacts  with  Trp68  in 
PlyL.  In  PlyG,  the  Val55  is  replaced  by  the  larger  residue  He,  but  this  is  complemented 
by  a  change  to  the  smaller  Leu  in  place  of  Trp68.  (B)  Stereo  view  of  the  active  site 
residues  of  PlyL  (light  gray),  T7  lysozyme  (PDB:  ILBA)  (Medium  gray),  and  PGRP-LB 
(PDB:  lOHT)  (dark  gray). 

Figure  4.  Lytic  and  cell  wall  binding  activity  of  PlyL.  Lysis  of  viable  cells  of  4  different 
Bacillus  species  by  (A)  full-length  PlyL  and  (B)  the  N-terminal  catalytic  domain.  The 
protein  concentration  was  0.4  pM  except  for  B.  anthracis  where  2  pM  was  used.  (C)  The 
time  required  for  the  full-length  and  catalytic  domain  of  PlyL  to  reduce  the  ODeoo  by  half 
(ti/2).  Error  bars  indicate  the  standard  deviation  from  at  least  three  independent 
experiments.  (D)  Confocal  image  of  the  GFP-CBD  fusion  protein  binding  to  the  cell  wall 
of  B.  cereus,  showing  the  rod-shape  cells  with  green  fluorescence.  No  fluorescence  was 
observed  for  other  Bacillus  species  or  for  the  control  with  GFP  alone  (data  not  shown). 

Figure  5.  A  proposed  model  of  species-specific  activation  of  PlyL.  (A)  In  the  unbound  full- 
length  PlyL,  a  C-terminal  domain  (gray  oval)  suppresses  a  catalytic  activity  of  the  N- 
terminal  domain  (blue  square)  by  preventing  the  access  of  peptidoglycan  substrate  (light 
blue  circles)  to  the  active  site,  either  sterically  or  allosterically.  (B)  An  alternative,  active, 
conformation  of  PlyL  is  stabilized  by  specific  interactions  of  the  C-terminal  domain  with 
a  cell-wall  component  (shown  by  black  cross)  characteristic  of  cognate  bacteria  (such  as 
B.  cereus).  In  the  absence  of  such  an  interaction  partner,  as  in  the  case  oiB.  subtilis,  B. 
megaterium  or  a  free  peptidoglycan  in  vitro,  the  full-length  PlyL  would  exist  mostly  in 
the  inactive  (closed)  conformation.  (C)  A  truncation  of  the  C-terminal  domain  maintains 
the  enzyme  in  a  constitutively  active  form. 
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Supplementary  Figure  (A)  HPLC  analysis  of  DNP-labeled  amino  groups  after  digestion  of  B. 
subtilis  peptidoglycan  with  the  purified  PlyL.  A  significant  amount  of  DNP-alanine  was 
observed  from  the  released  fraction.  (B)  Time  course  of  the  alanine  released  from 
peptidoglycan  with  the  N-terminal  domain  and  full-length  protein  of  PlyL. 


A 


0,01-1 

Experiment 

Control 

0.012 

E 

c  0,010 

iO 

S  o.coo 

g  0  000 

Ala 

1 

\  \ 

f\ 

0.005? 

^  1 

*  i 

0,000 

.  ...  ; 

o.oc 

i  b  00 

10,00 

15  00 

20,00  25  00 

Releition  time  (min) 


30  00  35,0c: 


B 


Supplementary  Figure  1 


15 


Principal  Investigator:  Liddington,  Robert  C. 


Structural  and  Functional  studies  of  a  Bacillus  anthracis  sensor  domain 

Gudrun  R.  Stranzl  Marcin  Grynberg  Chandra  La  Clair  Dorinda  Shoemaker 
Robert  Schwarzenbacher  Eugenio  Santelli  Adam  Godzik  Marta  Perego 
Robert  C.  Liddington  ' 


'infectious  &  Inflammatory  Disease  Research  Center  (I  I  DC),  The  Burnham  Institute, 

La  Jolla,  CA  92037,  USA 

^Department  of  Genetics;  Institute  of  Biochemistry  and  Biophysics;  Pawinskiego  5  A;  02106 
Warsaw;  Poland 

■^The  Scripps  Research  Institute,  Department  of  Molecular  and  Experimental  Medicine, 
Divison  of  Cellular  Biology,  La  Jolla,  CA  92037,  USA 

‘^Correspondence:  iiiddinu:lon@biirnham.org 


Summary 


We  have  determined  the  crystal  structures  of  two  proteins  BAS-1  and  BAS-2,  which  are 
encoded  by  the  genes  pXOl-118  and  pX02-61  from  the  virulence  plasmids  from  Bacillus 
anthracis.  The  gene  pXOl-118  belongs  to  the  pathogenicity  island  on  the  pXOl  plasmid. 
Both  structures  adopt  a  globin  fold  while  their  amino  acid  sequence  reveals  a  conserved 
sensory  motif,  KlAxER,  found  in  sensor  histidine  kinases  from  different  Bacilli  and  in  a  so 
called  trans-acting  positive  regulator  from  Bacillus  ceretis.  In  the  BAS-1  structure,  density 
corresponding  to  a  putative  ligand  was  observed.  GC-MS  identified  this  ligand  as  palmitic 
acid.  Isothermal  titration  calorimetry  (ITC)  experiments  showed  reasonable  binding  of  fatty 
acids  to  BAS-1  and  BAS-2.  These  studies  indieate  that  BAS-1  and  BAS-2  function  as  fatty- 
acid  binders  and  may  play  a  role  in  fatty  acid  transport  or  regulation  within  the  cell. 

Running  Title 

Introduction 

Bacillus  anthracis,  the  spore  forming  Gram-positive  Bacillus  is  the  causative  agent  of 
anthrax,  a  potentially  lethal  infectious  disease  in  humans  and  animals.  Fully  virulent  forms  of 
Bacillus  anthracis  carry  two  plasmids,  pXOl  (182  kb)  and  pX02  (96  kb).  These  plasmids 
encode  major  virulence  factors  such  as  those  responsible  for  toxin  production  and  capsule 
formation.  The  transcription  and  synthesis  of  anthrax  toxin,  capsule  and  certain  chromosomal 
genes  are  regulated  by  atxA  (anthrax  toxin  activator)  [Uchida,  1993  #16],  which  is  located  on 
pXOl  and  its  homologue  acpA  (anthrax  capsule  activator)  [Vietri,  1995  #52],  which  is 
located  on  pX02.  BAS-1  (Bacillus  anthracis  sensor  domain  also  known  as  pXOl-118  gene) 
has  its  ORF  358  base  pairs  close  to  atxA  (Figure  1-A)  and  is  transcribed  in  a  different 
direction  than  atxA.  BAS-2  (also  known  as  pX02-61  gene)  has  its  ORF  5658  base  pairs  from 
acpA  (Figure  1-C)  away,  and  it  has  been  shown  that  pX02-61  (BAS-2)  is  regulated  by  atxA 
[Bourgogne,  2003  #45].  BAS-1  and  BAS-2  share  an  amino  acid  sequence  identity  of  61  %, 
however  BAS-1  has  a  higher  sequence  identity  of  81  %  to  a  homologue  protein  (locus 
ZP_00236329)  [Hoffmaster,  2004  #47]  from  Bacillus  cereus,  the  spore  forming  Gram¬ 
positive  Bacillus  is  found  in  soil  and  many  other  sources,  it  is  an  opportunistic  pathogen  that 
causes  food  poisoning  manifested  by  diarrhoeal  or  emetic  syndromes.  Few  genes  from  the 
pXOl  pathogenicity  island  [Okinaka,  1999  #57],  pXOl-96  to  pXO  1-127,  appear  to  be  present 
in  the  various  B.  cereus  group  strains,  too  [Read,  2003  #46].  Interestingly  is  the  fact  that  the 
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ORF  of  the  homologue  protein  ZP_00236329  (gene  BCE_G9241_pBC218_0049)  from 
Bacillus  cereiis,  which  has  a  high  homology  to  BAS-1,  has  its  ORF  349  base  pairs  from  atxA 
from  B.  cereiis  (Figure  1-B)  [Floffmaster,  2004  #47]. 

By  searching  GenBank  at  the  NCBI  with  the  amino  acid  sequence  of  BAS-1,  using  the 
program  PSI-BLAST  with  default  values,  we  identified  the  amino  acid  sequence  of  the  N- 
terminal  sensor  domains  of  several  bacterial  sensor  histidine  kinases,  however  their  sequence 
identity  to  BAS  is  27  %.  All  aligned  amino  acid  sequences  do  have  a  KIAxER  motif  in 
common.  The  sensor  histidine  kinases  were  found  in  Bacillus  anihracis,  Bacillus  cereiis. 
Bacillus  thuringensis.  The  two-component  regulatory  system  to  which  Sensor  histidine 
kinases  belong  are  composed  of  two  domains:  an  N-terminal  signal  input  domain  that  often 
possesses  sub-domains  with  recognized  signaling  functions  and  a  C-terminal  autokinase 
domain  [Stephenson,  2002  #17].  The  sensing  domain  monitors  changes  in  light,  redox 
potential,  and  small  ligands,  which  can  further  cause  protein-protein  interaction,  DNA- 
binding  and  function  to  activate  and/or  repress  transcription  of  specific  genes  [Stock,  2000 
#49]. 

It  has  been  reported  that  there  is  a  two-component  signal  transduction  system  composed  of  a 
sensor  kinase,  DesK  and  a  response  regulator,  DesR,  which  are  responsible  for  cold  induction 
of  the  des  gene  coding  for  the  A5-lipid  desaturase  from  Bacillus  siibtilis.  In  this  case 
unsaturated  fatty  acids  (UFAs),  act  as  negative  signalling  molecules  of  des  transcription. 
Further  they  report  that  the  difference  in  potency  among  16:1  A5  and  other  fatty  acids  tested 
strongly  suggests  that  fatty  acids  with  a  double  bond  at  the  A5  position  act  as  specific  signals 
regulating  the  DesK-DesR  signal  transduction  [Aguilar,  20011  #50]. 

Further,  two-component  regulatory  proteins  are  involved  in  the  initiation  of  sporulation  in 
Bacillus  subtilis.  There,  a  signal  activates  the  autophosphorylation  of  histidine  kinases,  KinA 
and  KinB,  which  transfer  the  phosphoryl  group  to  SpoOF,  a  single  domain  of  the  two- 
component  response  regulator.  Phosphorylated  SpoOF  passes  the  phosphate  to  the  final 
transcriptional  regulator,  SpoOA,  through  a  phosphotransferase,  SpoOB  [Tzeng,  1997  #59]. 

The  existence  of  antibiotic-resistant  bacterial  strains  that  arise  either  naturally  or  through 
deliberate  engineering  emphasizes  the  need  for  alternative  therapeutic  approaches.  Vaccines 
are  typically  problematic  for  prophylactic  treatment  of  large  civilian  groups,  because  of 
possible  side  effects.  The  two-component  systems  and  phosphorelays  have  been  recognized  as 
targets  for  antimicrobial  intervention  [Stephenson,  2002  #58].  Therefore  different  approaches 
are  necessary  to  get  to  know  more  about  structure  and  function  of  proteins  involved  in  the 
molecular  mechanism  of  Bacillus  anihracis  virulence  and  pathogenicity. 
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In  this  study,  we  have  determined  the  structures  of  both  proteins  BAS-1  and  BAS-2  by  x-ray 
crystallography.  After  successfully  determining  a  fatty  acid  bound  to  BAS-1  by  GC-MS,  we 
show  through  isothermal  calorimetry  experiments  that  BAS-1  and  BAS-2  bind  saturated  as 
well  as  unsaturated  fatty  acids.  We  also  can  exclude  that  atxA  does  not  bind  BAS-1  from  a  gel 
filtration  experiment.  However,  in  order  to  get  to  know  more  about  the  function  of  these  new 
homologue  proteins  of  the  sensor  domain  of  the  sensor  histidine  kinases  more  experiments 
have  to  be  done. 

Results  and  Discussion 

Structure  of  Bacillus  anthracis  BAS-1  protein 

We  crystallized  the  full-length  Bacillus  anthracis  BAS-1  protein  and  determined  its  structure 
to  1.76  A  resolution.  Selenium  Singlewavelength-anomalous  dispersion  (SAD)  phasing 
techniques  were  used  to  solve  the  structure.  The  asymmetric  unit  contains  one  molecule, 
which  forms  with  another  molecule  an  asymmetric  crystallographic  dimer.  The  model 
includes  amino  acid  residues  -2  to  150  (of  150  total  residues),  including  3  residues  (Gly,  Ser, 
His)  from  the  N-terminal  His-tag.  The  B.  anthracis  BAS-1  dimer  comprises  a  single  structural 
domain  characterized  by  six  helices  which  form  a  so-called  globin  fold.  Helices  1  to  4  are  45° 
twisted  against  helices  5  to  6.  At  the  C-terminus  there  is  one  turn  a-helix  (figure  2-B). 

The  putative  active  site,  which  is  represented  by  the  KIAxR  domain  is  located  on  helix  a4. 
Figure  5-C  shows  the  KIAxR  residues  (K-69,  1-70,  A-71,  R-74)  which  are  involved  in  a 
complex  hydrogen  and  salt  bridge  bonding  network.  The  residue  R-74  is  buried  into  the 
cavity  through  salt  bridges  with  D-33  (OD1/NH2  is  3.53  A  and  OD2/NH1  is  2.77  A)  and  the 
fatty  acid  ligand  1  lA,  further  through  hydrogen  bonding  with  HOH-95  and  HOH-63.  The 
HOH-63  has  hydrogen  bonding  to  residues  R-74,  E-73  and  to  the  ligand  11  A.  This  network 
stabilizes  the  R-74  in  a  hydrophobic  environment,  which  would  force  the  R-74  to  point  out 
onto  the  surface  of  the  protein.  Due  to  salt  bridges  and  the  hydrogen  bonding  of  R-74,  it  stays 
buried  into  the  cavity.  In  coordinating  distance  of  the  carboxyl  oxygen  of  the  fatty  acid  ligand 
1 1 A  a  small  piece  of  electron  density  has  been  observed,  which  is  very  likely  a  potassium  ion. 
Residue  K-69  is  engaged  in  salt  bridges  with  E-38,  E-73  and  N-42  all  bonds  are  within  3  A 
distance.  While  the  core  of  the  globin  fold  is  dominated  primarily  by  hydrophobic  residues, 
the  two  major  surfaces  of  the  globin  fold  show  a  distinct  polarity  in  the  abundance  of  charged 
residues.  The  dimerization  interface  shows  both  hydrophobic  as  well  as  charged  interactions 
which  involves  residues  mainly  from  helices  a5  and  a6  like  Ala-82,  Ile-85,  -113,  -124,  -128, 
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Asn-89,  -117,  Lys-92,  -93,  -1 10,  -135,  -136,  Leu-96,  Phe-100,  GIy-106,  Cys-109,  Glu-1 14,  - 
121,  Tyr-131,  -132  (figure  3-A).  There  are  two  Cysteines  namely  A  Cys-109  and  B  Cys-109, 
which  are  in  a  distance  of  4.2  A.  No  disulfide  bond  could  be  observed  in  the  electron  density 
maps.  Each  monomer  has  a  salt  bridge  (A  Asn-1 17  with  A  Lys-92)  which  is  located  within 
the  dimer  interface,  however  has  no  influence  to  form  an  intermolecular  salt  bridge.  Figure  2- 
B  shows  the  crystallographic  BAS-1  dimer  which  is  consistent  with  gel  filtration  experiments 
where  a  stable  BAS-1  dimer  was  observed  in  solution  (figure2-D). 

The  BAS-1  exhibit  a  cavity  (figure  5-A)  which  is  lined  with  mostly  hydrophobic  residues 
which  are  listed  in  table  2.  The  residue  Phe-19  (figure  5-E)  has  two  conformations  and  it 
looks  like  that  it  stabilizes  the  ligand  in  the  cavity.  The  volume  of  the  mostly  hydrophobic 
cavity  is  about  123  (calculated  with  program  VOIDOO  [Kleywegt,  1994  #39] ),  its  length 
is  about  20  A  (distance  between  Arg-74NE  and  Phe-19CB)  and  its  width  is  8  A  (from  Phe- 
84CD1  to  Trp-23CH2).  No  water  molecules  are  found  in  this  region,  apart  from  water  62,  63, 
65  and  water  95.  Additional  electron  density  close  to  the  ligand  is  assumed  a  potassium  ion 
according  to  B  values.  In  figure  5-A  the  cavity  is  shown  as  a  pink  grid,  with  the  ligand  is 
shown  in  ball  and  sticks  together  with  residues  (His-30,  His-32,  Val-79,  Glu-83)  which  are 
close  to  the  entrance  of  the  cavity. 

While  the  core  of  the  globin  fold  is  dominated  primarily  by  hydrophobic  residues,  the  two 
major  surfaces  of  the  dimer  show  a  distinct  polarity  in  the  abundance  of  charged  residues.  The 
surface  of  the  BAS-1  dimer  contains  18  of  30  glutamate  residues,  8  of  12  aspartic  acid 
residues,  6  of  8  arginine  residues,  and  24  of  38  lysines  residues.  Resulting  a  slightly  negative 
net  charge  of  the  dimer.  Figure  3-B  shows  the  electrostatic  potential  surface  maps,  whereas 
charged  residues  are  grouped  in  El  to  E4  zones  for  the  different  charged  parts  of  the  dimer. 
Zone  El  is  a  small  positively  charged  part  which  is  provoked  by  residues  Lys-5,  Arg-6,  Tyr-7, 
and  Arg-55.  Zone  E2  is  a  bigger  negatively  charged  part  which  is  provoked  by  residues  Glu- 
17,  Glu-38,  Glu-57,  Asp-62,  Glu-64,  Asp-65,  Glu-73,  Asp-83,  Glu-90  and  Asp-121 .  Zone  E3 
are  the  C-termini  of  each  monomer  which  is  positively  charged  which  is  provoked  by  residues 
Lys-78,  Lys-133,  Lys-135,  Lys-136,  Lys-146,  Lys-147  and  Lys-148.  Zone  E4  is  a  smaller 
positively  charged  part  which  is  provoked  by  residues  Lys-18,  Lys-24,  Lys-25,  Arg-26,  Lys- 
41,  Arg-55  and  Lys-93. 

A  DALI  [Holm,  1 993  #29]  search  showed  closest  structural  homology  to  an  dimeric  oxygen 
sensor  from  Bacillus  Subtilis  (PDB  1 OR4,  [Zhang,  2003  #40])  (figure  4-A),  a  light-harvesting 
protein  (PDB  IQGW,  [Wilk,  1999  #41],  and  a  dimeric  bacterial  hemoglobin  from  Vitroscilla 
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sp.  (PDB  2VHB,  [Tarricone,  1997  #42],  Based  on  its  three-dimensional  structure,  BAS-1 
belongs  to  the  globin  superfamily.  These  hits  were  within  a  RMSD  of  3.0  A,  whereas  their 
sequence  identity  to  BAS-1  was  around  13%  and  the  length  was  about  150  residues  (figured- 
B).  This  means  that  through  primary  sequenee  homology  searches  a  globin  fold  would  have 
never  been  identified.  From  superposition  with  the  bacterial  hemoglobin  from  Vitroscilla  s.p 
we  could  find  a  slight  difference  in  the  hydrophobic  pocket.  The  pocket  of  BAS-1  has  no 
haeme  as  co-factor  and  it  is  also  slightly  narrower  than  in  the  oxygen  sensor  of  Bacillus 
Suhtilis  (figure  4-A).  Due  to  the  same  fold  with  haeme  binding  proteins  a  reconstitution  with 
haemin  has  been  tried,  however  failed,  because  of  the  differences  in  the  hydrophobic  pocket 
compared  with  the  oxygen  sensor  in  Bacillus  Subtilis.  The  helix  3  from  BAS-1  is  closer  to 
helix  4  than  helix  F  and  E  in  the  oxygen  sensor  of  B.  Subtilis  structure  (figure  4-A).  From  a 
superposition  we  see  that  Leu-92,  lle-83,  Tyr-70,  Phe69,  Leu-96,  His-123,Thr-95  from  the 
oxygen  sensor  of  B.  Subtilis  is  a  Ile-39,  Lys-36,  Lys-24,  Trp-23,  Gly43,  Arg-74,  Asn-42  in  the 
BAS-1  structure,  respectively.  Residues  Ile-39  and  Asn-42  would  clash  into  the  haeme.  This 
space  problem  may  explain  why  a  reconstitution  with  haemin  failed.  The  His- 123  from  the 
oxygen  sensor  of  B.  Subtilis  is  is  the  residue  which  covalently  binds  the  iron  in  the  heme, 
through  superposition  with  BAS-1  we  identified  the  residue  Arg-74  which  is  at  the  same 
position  as  the  His- 123  it  seems  that  the  ligand  binding  pockets  pretty  well  conserved.  It  could 
be  that  BAS-1  and  BAS-2  are  a  new  sub-family  within  the  globin  superfamily  due  to  similar 
features  in  their  architectures,  however  quite  different  in  their  amino  acid  sequence. 


Description  of  the  BAS-1  ligand. 

The  BAS-1  structure  contains  additional  electron  density  indicative  of  a  non  covalently  bound 
ligand.  Figure  5-E  shows  the  undecanoic  acid  ligand  fit  into  the  remaining  electron  density, 
which  has  a  long  hydrophobic  tail  which  fits  perfectly  in  the  hydrophobic  cavity  of  BAS-1. 
His-tagged  BAS-1  protein  was  mixed  with  the  cnide  extract  from  Bacillus  cereus  and 
incubated  for  one  hour.  After  this  procedure  BAS-1  was  purified  and  crystallized.  Still  the 
same  ligand  could  be  observed.  Gas  chromatography  mass  spectrometry  (GC-MS)  analysis 
revealed  a  mass  pattern  indicative  of  a  hexadecanoic  acid  molecule.  The  hydrophobic  tail  of 
the  hexadecanoic  acid  is  flexible  and  ean  be  only  partly  modeled.  Therefore,  a  shorter  fatty 
acid  has  been  used  in  the  model.  The  chemical  environment  in  the  cavity  is  also  compatible 
that  the  ligand  being  a  fatty  acid.  Its  hydrophobic  tail  contacts  a  hydrophobic  residue  (Phe- 
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1 9),  which  is  seen  in  two  conformations  and  its  carboxylic  group  within  coordinating  distance 
of  an  argine  (Arg-74).  The  question  arises,  where  does  the  ligand  comes  from?  We  did  not 
add  any  hexadecanoic  acid  to  buffer  solutions  during  the  protein  purification.  We  think  that 
during  cell  disaiption  the  BAS-1  protein  binds  the  hexadecanoic  acid,  which  is  part  of  all 
gram-negative  bacteria  cell  walls  [Kaneda,  1967  #43].  We  tried  different  buffer  solutions  and 
could  still  find  the  ligand,  therefore  we  think  that  it  maybe  a  physiological  ligand. 


Structure  of  Bacillus  anthracis  BAS-2  protein 

We  crystallized  the  full-length  Bacillus  anthracis  BAS-2  protein  and  determined  its  structure 
to  1.49  A  resolution.  The  BAS-1  model  was  used  for  solving  the  BAS-2  stmcture  with 
molecular  replacement.  The  asymmetric  unit  contains  two  molecules,  which  form  an 
asymmetric  crystallographic  dimer.  The  dimerization  interface  shows  both  hydrophobic  as 
well  as  charged  interactions  which  involves  residues  mainly  from  helices  H5  and  H6  like  Ala- 
82,  Ile-85,  -93,  -124,  Asn-89,  -117,  Lys-92,  -114,  -135,  Met-96,  Thr-100,  Leu- 106,  Gin- 110, 
Val-113,  -128,  Tyr-109,  -131,  -132,  Asp-121  (figure  3-A).  Residue  Lys-92  is  engaged  with 
Asn-1 17  to  perform  a  salt  bridge  within  a  BAS-2  monomer.  The  model  includes  amino  acid 
residues  5-136  (of  136  total  residues),  excluding  3  residues  (Gly,  Ser,  His)  from  the  N- 
temiinal  His-tag  and  4  residues  (Met-1,  Glu-2,  Glu-3,  Ile-4)  from  the  BAS-2  protein.  These 
residues  can  not  be  observed  in  the  electron  density.  The  B.  anthracis  BAS-2  dimer  (figure2- 
C)  comprises  a  single  structural  domain  characterized  by  six  helices  which  form  a  so-called 
globin  fold.  There  are  two  Cysteines  namely  A  Cys-6  and  A  Cys-9,  which  form  a  disulfide 
bond  within  a  distance  of  3.2  A,  observed  in  the  electron  density  maps.  It  looks  like  that  the 
disulfide  bond  causes  a  distortion  of  the  helix  at  the  beginning  of  the  amino  acid  sequence, 
which  may  explain,  why  this  part  is  more  flexible  and  no  density  could  be  observed  for 
residues  1  to  4. 

The  putative  active  site,  which  is  represented  by  the  KIAxR  domain  is  located  on  helix  a4. 
Figure  5-D  shows  the  KIAxR  residues  (K-69,  1-70,  A-7I,  R-74)  which  are  involved  in  a 
complex  hydrogen  and  salt  bridge  bonding  network.  The  residue  R-74  is  buried  into  the 
cavity  through  a  salt  bridges  with  D-33  and  through  hydrogen  bonding  with  HOH-17  and 
HOH-23.  The  HOH-23  has  hydrogen  bonding  to  residues  R-74  and  E-73.  This  network 
stabilizes  the  R-74  in  a  hydrophobic  environment,  which  would  force  the  R-74  to  point  out 
onto  the  surface  of  the  protein.  Due  to  salt  bridges  and  the  hydrogen  bonding  of  R-74,  it  stays 
buried  into  the  cavity.  A  small  piece  of  density  has  been  observed,  which  is  very  likely  a 
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iodide  ion,  which  is  too  far  away  to  coordinate  the  R-74  or  waters.  Residue  K-69  is  engaged 
in  salt  bridges  with  E-38,  E-73  and  N-42  all  bonds  are  within  3  A  distance. 

The  BAS-2  exhibit  a  cavity  (figure  5-B)  which  is  lined  with  mostly  hydrophobic  residues 
which  are  listed  in  table2.  The  volume  of  the  mostly  hydrophobic  cavity  is  about  123 
(calculated  with  program  VOIDOO  [Kleywegt,  1994  #39] ),  its  length  is  about  26  A  (distance 
between  Arg-32NH1  and  lle-95CDl)  and  its  width  is  8  A  (from  Trp-23  CH2  to  lle-70  GDI). 
No  water  and  no  ligand  molecules  were  found  in  the  cavity.  In  figure  5-B  the  cavity  is  shown 
as  a  pink  grid  and  residues  which  are  at  the  entrance  are  shown  in  ball  and  stick  (Arg-30,  Arg- 
32,  Val-79,  Glu-83). 

A  DALI  [Holm,  1993  #29]  search  showed  closest  structural  homology  to  the  same  hits  as  for 
BAS-1.  An  oxygen  sensor  in  Bacillus  Subtil  is  (PDB  10R4,  [Zhang,  2003  #40]),  a  light¬ 
harvesting  protein  (PDB  IQGW,  [Wilk,  1999  #41],  and  a  dimeric  bacterial  hemoglobin  from 
Vitroscilla  sp.  (PDB  2VHB,  [Tarricone,  1997  #42].  These  hits  were  within  a  RMSD  of  3.0  A. 

The  electrostatic  surface  potential  (figure  3-C)  shows  4  large  charged  spots.  One  negatively 
charged  we  have  called  El  is  provoked  by  residues  (Asp-33,  Asp-76,  Glu-37,  Glu-38,  Glu- 
73).  One  positively  charged  one  is  called  E2,  which  is  provoked  by  residues  (Lys-24  and  Lys- 
25  ),  another  positively  charged  one  is  called  E3,  which  is  provoked  by  residues  (Arg-30,  Lys- 
78  and  Lys-135).  There  is  a  huge  positively  charged  spot  E4,  which  is  provoked  by  residues  ( 
Lys-5,  Arg-10,  Lys-13,  Lys-55,  Lys-92,  Lys-114,  Lys-115,  Lys-117).  Altogether  we  can 
observe  7  out  of  14  glutamic  acid  residues,  4  out  of  5  aspartic  acid  residues,  11  out  of  18 
lysine  residues  and  2  out  of  6  argine  residues  on  the  surface.  Resulting  a  slightly  positive  net 
charge  of  the  monomer. 


Sequence  and  Structure  homology  of  BAS-1  and  BAS-2 

A  BLAST  search  with  the  amino  acid  sequence  of  BAS-1  revealed  a  homologue  from 
Bacillus  anthracis  BAS-2  with  an  e-value  of  e"^^  and  a  homologue  from  Bacillus  cereus 
ZP_00236329  /  gi:47565287  with  an  e-value  of  e"^^  and  the  N-terminal  domains  of  several 
Bacilli  sensor  histidine  kinases.  The  amino  acid  sequence  identities  between  BAS-1  and  BAS- 
2  and  the  homologue  protein  ZP_00236329  from  Bacillus  cereus  are  62%  and  81  %, 
respectively.  The  amino  acid  sequence  identity  between  BAS-1  and  the  N-terminal  domains 
of  several  Bacilli  sensor  histidine  kinases  are  27  %.  The  amino  acid  sequence  identity 
between  BAS-2  and  the  homologue  protein  ZP_00236329  from  Bacillus  cereus  is  61  %.  All 
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aligned  amino  acid  sequences  (figure  3-A)  do  have  a  KIAxER  motif  in  common.  In  the  BAS 
structures  these  residues  belong  to  Lys-69,  Ile-70,  Ala-71,  Glu-73  and  Arg-74  (figure  6). 
Residue  Pro-34  is  conserved  in  both  BAS  structures  as  well  as  Pro- 104,  which  are  both 
responsible  for  the  turn  in  the  secondary  structure.  Another  interesting  difference  is  residue  91 
in  BAS-1  a  glycine  and  in  BAS-2  an  alanine,  these  residues  are  located  on  helix  a5  and  are 
located  in  the  hydrophobic  cavity  of  both  proteins.  Therefore  is  could  be  possible  that  BAS-1 
and  BAS-2  have  a  different  affinity  to  fatty  acids.  One  significant  structural  and  sequence 
difference  between  BAS-1  and  BAS-2  is  the  C-terminal  extension.  The  BAS-2  structure  is  1 1 
amino  acids  shorter  and  therefore  the  C-termini  do  not  intersect.  The  BAS-1  and  BAS-2 
structures  have  a  RMSD  of  0.753.  RMS  calculations  have  been  carried  out  using  the  program 
LSQMAN  [Kleywegt,  2001  #20].  Figure  2-A  shows  a  c„  superposition  of  BAS-1  and  BAS-2, 
which  demonstrates  how  similar  the  .c„  is,  however  the  Electrostatic  Potential  surface  of  the 
BAS-1  and  the  BAS-2  dimer  looks  different.  Especially  the  major  positively  charged  spots  E3 
from  BAS- 1  and  E4  from  BAS-2  (figure  3)  are  different  placed  on  the  their  surfaces  and  may 
explain  their  different  behavior  during  protein  purification  and  crystallization. 

To  demonstrate  how  the  electrostatic  surface  potentials  of  the  sequence  aligned  proteins  from 
figure  3-A  look  alike,  residues  of  all  the  candidates  have  been  mapped  onto  the  BAS-1  model. 
BAS-1  has  been  modified  at  the  N-  and  C-terminus  for  better  comparison.  As  orientation 
helices  al  through  a4  are  exposed.  Apparently,  the  electrostatic  surface  potentials  of  BAS-1 
and  its  homologue  protein  ZP_00236329  from  Bacillus  cereus  are  very  similar.  BAS-2  has 
some  same  patches  with  BAS-1  in  common,  but  the  the  N-terminal  domains  of  several  Bacilli 
sensor  histidine  kinases  are  much  more  negatively  charged  as  the  BAS  protein  group  (figure 
3-D).  However  within  the  histidine  kinases  the  are  highly  conserved.  The  stmctures  Ba.c.3 
and  Ba.a.  have  the  same  amino  acid  sequence,  however  Ba.c.l,  has  two  different  amino  acids 
V45A  and  N120A,  Ba.c.2  has  two  different  amino  acids  El  19D  and  N125A,  Ba.th.  has  eight 
different  amino  acids  like  VI 31,  D37E,  R42K,  F55L,  I60T,  E63D,  N120A,  Q121K.  In  figure 
5-C  and  D  the  putative  active  sites  of  BAS-1  and  BAS-2  show  that  the  waters  in  these  sites 
are  very  conserved,  however  the  fact  that  we  observe  a  potassium  ion  in  the  BAS-1  structure 
and  an  iodide  ion  in  the  BAS-2  structure  reveals  the  question  that  maybe  because  of  the 
iodides  charge  and  ion  size  no  fatty  acid  binding  in  the  crystal  could  be  observed.  However 
the  ITC  experiments  binding  to  fatty  acids  could  be  observed,  because  there  is  no  presence  of 
iodide. 
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ITC  binding  studies  with  BAS-1  and  BAS-2 

We  have  measured  affinity  of  BAS-1  and  BAS-2  to  saturated  (myristic,  palmitic),  saturated 
branched  (12-methyltetradecanoic,  13-methyltetradecanoic),  and  non-saturated  (palmitoleic) 
acids  using  ITC  (Fig.  7).  All  binding  data  were  best  described  by  a  one  to  one  stoichiometry 
model.  We  did  not  detect  any  selectivity  among  tested  acids.  All  bound  BAS-1  and  BAS-2 
were  exothennically  (Fig.  7),  with  very  similar  affinities  (Table  3). 


AtxA  and  other  stuff 

In  order  to  test  BAS-1  binds  atxA  both  proteins  have  been  expressed,  mixed  and  applied  to  a 
size  exclusion  column.  No  complex  of  atxA  and  BAS-1  could  be  observed,  both  proteins 
eluted  separately.  In  addition  to  that  we  performed  an  electrophoretic  mobility  shift  assay 
(EMSA)  with  BAS-1,  atxA  and  the  pagA  promoter  DNA  in  the  presence  and  absence  of 
Na2C03.  We  tried  different  concentrations  of  BAS-1,  atxA  and  the  pagA  promoter  DNA,  and 
also  different  combinations,  like  atxA  without  BAS-1  and  could  also  see  no  binding.  Our 
conclusion  is  that  BAS-1  does  not  bind  atxA. 

BAS-1  and  BAS-2  has  been  tested  for  hydrolase  and  oxidoreductase  activity  and  it  has  been 
tested  negativ. 

Gene  deletion  analysis 


In  an  attempt  to  define  a  physiological  function  for  the  product  of  ORFl  18,  a  34F2  derivative 
strain  carrying  a  spectinomycin  resistance  cassette  in  place  of  ORF 1 1 8  was  constructed.  The 
strain,  named  34F2_1 18  did  not  show  any  growth  or  sporulation  defect  when  compared  to  the 
parental  strain  34F2  (Fig.8-A  and  data  not  shown).  Both  strains  were  transformed  with  the 
pTCVlac  construct  carrying  the  atxA  promoter  and  the  transcription  of  this  gene  was  analyzed 
by  means  of  _-galactosidase  assays.  As  shown  in  Fig.  8-A,  no  difference  in  transcription  was 
observed  between  the  parental  strain  and  34F2_1 18  indicating  that  ORFl  18  does  not  affect 
AtxA  production.  As  a  consequence,  the  product  of  ORFl  1 8  did  not  affect  the  transcription  of 
the  pagA  gene  encoding  the  protective  antigen  (our  unpublished  results). 
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Yeast  Two-hybrid  system  analysis 


The  yeast  two-hybrid  system  (Clontech)  was  used  to  test  whether  ORFl  18  could  interact  with 
AtxA.  Both  genes  were  singly  cloned  in  the  bait  plasmid  pGBT9  and  in  the  prey  plasmid 
pGAD424.  When  the  interaction  assays  were  carried  out  in  the  yeast  strain  AH  109,  we 
detected  interaction  in  the  control  strain  carrying  ORFl  18  on  both  pGBT9  and  pGAD424 
plasmids  but  we  did  not  detect  any  interaction  with  AtxA  either  as  a  bait  or  as  a  prey.  These 
results  confirm  that  ORFl  18  can  dimerize  but  do  not  support  the  hypothesis  that  it  may 
interact  with  AtxA. 

Gene  transcription  analysis 


The  transcription  profile  of  the  ORFl  18  and  ORF61  promoters  were  determined  by  means  of 
_-gaIactosidase  analysis  carried  out  on  the  promoter-/acZ  fusion  constructs.  The  pTCVlac 
plasmid  derivatives  carrying  either  the  ORFl  18  or  the  ORF61  promoter  were  transformed  in 
the  Sterne  strain  34F2  or  in  its  derivative  carrying  a  deletion  of  the  atxA  gene  (34F2_atxA). 
The  results  of  this  analysis  are  shown  in  Figure  8-B.  The  transcription  from  both  promoters 
was  induced  in  late  exponential  phase  and  it  increased  during  the  early  hours  of  stationary 
phase.  The  absence  of  AtxA  prevented  this  induction  from  the  ORF61  promoter  but  not  from 
the  ORFl  18  promoter.  A  similar  pattern  of  transcription  was  observed  when  the  cells  were 
grown  in  Schaeffer’s  sporulation  medium  which  induces  sporulation  of  B.  anthracis  cells  at  a 
faster  rate  than  the  LB  medium.  Thus  while  transcription  of  the  ORFl  18  gene  is  independent 
of  AtxA,  the  transcription  of  ORF61  depends  on  this  virulence  factor  as  previously  indicated 
by  microarray  study  [Bourgogne,  2003  #45]. 
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Experimental  Procedures 


Cloning,  expression  and  purification 

The  plasmid  pXOl  from  Bacillus  anthracis,  steme  strain  (provided  by  Philip  Hanna)  served 
as  a  template  for  cloning  the  hypothetical  protein  BAS-1 .  The  forward  and  reverse  primers  for 
pXOl-118  (5'  -  GAGTGGACATATGGAAGCAACAAAACG  -  3'  ,  5'  -  CTATAGGAT 
CCAAAAATTTCAAGGTG  -  3')  were  used  for  amplification  and  the  ORF-encoding  BAS-1 
was  subsequently  cloned  with  restrictions  sites  Ndel/BamHI  into  a  pET-28a  vector  (Novagen) 
and  transformed  into  BL21(DE3)  cells,  and  plated  on  a  selective  medium  containing 
kanamycin.  Colonies  were  grown  at  37  °C  and  500ml  of  LB  media  (  Bacto'^'^  Yeast  Extract, 
Bacto™  Tryptone  purchased  by  Difco  and  NaCl,  pH  adjusted  to  7.5)  were  inoculated. 
Cultures  were  grown  at  32  °C  overnight  to  an  optical  density  (600nm)  of  about  0.6  -  0.7  and 
protein  expression  induced  by  the  addition  of  Isopropyl-P-thiogalactopyranoside  (IPTG)  to 
0.1  mM.  Shaking  proceeded  for  a  further  4  h  at  220  rpm  at  32  "C.  BAS-1  was  expressed  in  E. 
coli  BL21  (DE3)  for  selenomethionine  (SeMet)  incorporation  [Harrison,  1994  #56].  Cells 
were  harvested  by  centrifuging  the  cells  at  6000  rpm.  The  supernatant  was  discharged  and  the 
cell  pellets  resolved  in  lysis  buffer  (20  mM  Tris  pH  7.4,  0.5  M  NaCl,  5  mM  Imidazole,  1% 
Triton),  sonicated  and  centrifiiged  at  16000  rpm  for  10  min.  the  supernatant  was  applied  to  a 
nickel  column  (Amersham,  Pharmacia).  BAS  proteins  have  been  eluted  with  250  mM 
Imidazole,  0.5  M  NaCl,  20  mM  TRIS  HCI  pH  7.4  and  subsequently  dialyzed  against  1  M 
NaCl,  20mM  TRIS  HCI  pH  7.4  over  night.  After  cleavage  of  the  His-tag  with  thrombin  the 
solution  was  concentrated  and  further  purified  using  a  Superdex75  gel  filtration  column 
connected  to  an  Akta-FPLC  (Amersham,  Pharmacia).  Protein  has  been  concentrated 
(AMICON)  and  dialyzed  into  20mM  TRIS  HCI  buffer  pH  7.4,  IM  NaCl,  50pM  KCl,  5mM 
DTT  and  flash  frozen  in  liquid  nitrogen  for  long  term  storage  at  -80  °C.  The  protein  runs  on 
SDS-PAGE  gel  as  expected  with  a  molecular  weight  of  18.510  kDa,  also  confirmed  by 
MALDI-TOF.  On  a  sizing  column  (Superdex75),  the  estimated  molecular  weight  is  ~32  kDa, 
suggestive  of  a  dimer  in  solution.  The  synthetic  pX02-61  gene  was  obtained  from  GenScript 
Co,  NJ,  USA  and  subcloned,  expressed  and  purified  as  previous  described.  BAS-2  has  been 
dialyzed  into  20mM  TRIS  HCI  buffer  pH  7.4,  500  mM  NaCl,  50p.M  KCl,  5mM  DTT  and 
concentrated  to  16mg/ml  and  flash-frozen  in  liquid  nitrogen  for  storage  at -80  °C.  The  protein 
runs  on  SDS-PAGE  as  expected  with  molecular  weight  of  16.4  kDa,  confinned  by  MALDI- 
TOF.  On  a  sizing  column  (Superdex75),  the  estimated  M.W.  is  ~32  kDa,  suggestive  of  a 
dimer  in  solution. 
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Cloning,  expression  and  purification  of  atxA 

The  ORF-encoding  atxA  sequence  was  cloned  in  a  pET-15b  vector  (Novagen)  and 
transformed  into  BL21{DE3)  cells.  The  protein  runs  on  SDS-PAGE  gel  as  expected  with  a 
molecular  weight  of  55.561  kDa,  which  was  confirmed  by  MALDI-TOF.  On  a  sizing  column, 
the  estimated  M.W.  is  ~1 15  kDa,  suggestive  of  a  dimer  in  solution. 

Crystallization,  data  collection,  and  structure  solution 

Purified  native  and  SeMet-substituted  BAS-1  was  crystallized  by  vapor  diffusion  at  room 
temperature  using  sitting  and  hanging  drops  of  3  pi  of  precipitant  solution  (40%  (v/v)  PEG- 
300,  lOOmM  Tris-HCl  pH  5.4,  5%  (w/v)  PEG- 1000)  and  3 pi  of  protein  solution  (14mg/ml) 
yielded  crystals  within  3  days  and  belonged  to  space  group  P3221  with  unit  cell  parameters 
a=b=  89.86  A  c=  35.25  A  and  a=p=  90°  7=  120°  and  a  Matthews  eoefficient  2.2  (44.2% 
solvent).  Crystals  were  grown  rod  shaped  with  dimensions  0.1  mm  x  0.05  mm  x  0.05  mm. 
Data  collection  statistics  are  summarized  in  Table  1.  Crystals  were  already  grown  in  cryo 
protectant  solution  and  flash  cooled  in  liquid  nitrogen.  One  native  and  one  Se-SAD  (single 
anomalous  dispersion)  datasets  were  collected  at  SLAC  (Stanford  Linear  Accelarator  Center) 
SSRL  beamline  9-2  and  BNL  (Brookhaven  National  Lab)  NSLS  beamline  X26C, 
respectively.  Diffraction  images  were  processed  and  scaled  with  HKL  [Otwinowski,  1997 
#23].  The  program  SOLVE  [Terwillinger,  1999  #24]  was  used  to  locate  four  Se  positions  in 
the  BAS-1  structure,  which  were  used  to  obtain  initial  phases  (figure  of  merit  [FOM]=0.32). 
Following  phase  improvement  using  the  program  RESOLVE  [Terwillinger,  2001  #25] 
([FOM]=0.60)  and  automatic  model  building  with  RESOLVE  resulted  in  model  fragments  of 
9  chains  with  83  residues  at  a  model  completeness  of  77%.  Further  model  building  was 
performed  manually  in  O  [Kleywegt,  2001  #20], 

Purified  BAS-2  was  crystallized  by  microbatch  under  paraffin  oil.  One  good  diffracting 
crystal  could  be  obtained  from  IM  NaJ,  20%  (v/v)  PEG3350  solution,  which  grow  within  two 
days.  Crystals  grow  in  Tetragonal  and  Orthorhombic  crystal  systems.  The  Tetragonal  crystal 
form  was  crystallized  by  vapour  diffusion  method  at  room  temperature  using  0.1  M  TRIS  HCl 
pH  8.5,  30%  (v/v)  PEG4000,  2M  Li2S04.  Crystals  were  grown  rod  shaped  with  dimensions 
0.7  mm  X  0.3  mm  x  0.3  mm.  The  crystal  used  for  structure  solution  belonged  to  space  group 
P2i2|2i  with  unit  cell  parameters  a=44  A,  b=62.  A,  c=124  A,  a  =  p  =  y  =  90°.  Data  were 
processed  with  DENZO  and  SCALEPACK  [Otwinowski,  1997  #23].  Data  collection  statistics 
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are  summarized  in  Table  1.  High  and  low  resolution  data  sets  were  collected  at  SLAC 
(Stanford  Linear  Accelarator  Center)  up  to  1.49  A.  High  and  low  resolution  data  sets  were 
combined  and  the  stmcture  was  solved  with  molecular  replacement  using  the  BAS-1  structure 
as  a  search  model.  Model  building  and  refinement  were  carried  out  in  O  [Kleywegt,  2001 
#20]  and  REFMAC5  [Murshudov,  1997  #40].  The  asymmetric  unit  contains  two  molecules 
and  a  Matthews  coefficient  of  2.9  (56.7%  solvent). 


Refinement. 

The  initial  model  for  BAS-1  refinement  contained  one  chain  of  residues  2  to  147.  One  round 
of  rigid-body  refinement  was  carried  out  against  data  to  1.76  A  resolution  (native),  followed 
by  a  simulated-annealing  step  against  a  maximum-likelihood  target  with  the  programs  CNS 
[Brunger,  1998  #21]  and  REFMAC5  [Murshudov,  1997  #40].  In  the  active  site  an  electron 
density  of  a  ligand  was  located.  The  initial  model  for  BAS-2  contained  two  chains  of  residues 
5-136.  The  fatty  acid  model  undecanoic  acid  was  constructed  in  the  program  PRODRG 
[Schuettelkopf,  2004  #35]  and  manually  positioned  with  the  program  O  into  the  remaining 
density  of  the  active  site  of  BAS-1 .  Each  cycle  of  refinement  was  followed  by  manual  model 
rebuilding  with  the  program  O.  During  the  final  refinement  stages  of  BAS-1  stnicture, 
alternate  conformations  were  modeled  and  refined.  The  final  BAS-1  model  has  an  Rwork= 
18.50  %  and  Rfroc=24.10  %  for  data  between  76.7  and  1.76  A  resolution.  Average  B  factors 
(in  A^)  were  25.00  for  main  chain  and  32.65  for  side  chain  atoms,  34.62  for  solvent  atoms, 
37.42  for  ligands  and  ions.  The  final  BAS-2  model  has  an  Rwork=  17.70  %  and  Rf,cc=20.90  % 
for  data  between  62.02  and  1.49  A  resolution.  Superposition  of  model  BAS-2  chain  A  with 
chain  B  over  132  atoms  reveals  a  RMSD  of  0.434  A.  Average  B  factors  (in  A^)  were  18.90 
for  main  chain  and  23.95  for  side  chain  atoms,  35.90  for  solvent  atoms  and  39.66  for  ions. 


Quality  and  deposition  of  the  Crystallographic  Models. 

The  BAS-1  structure  has  been  evaluated  with  the  program  PROCHECK  [Laskowski,  1993 
#32]  97.3  %  of  the  residues  are  in  the  most  favoured  region  of  the  Ramachandran  plot  and  2.7 
%  are  in  allowed  regions.  The  residues  Serl5  and  Phel9  are  in  two  alternate  conformations  in 
chain  A.  The  BAS-2  structure  has  been  evaluated  with  the  program  PROCHECK  [Laskowski, 
1993  #32]  98.4  %  of  the  residues  are  in  the  most  favoured  region  of  the  Ramachandran  plot 
and  1.6  %  are  in  allowed  regions.  The  residues  Ser21,  Ser48,  Asp76,  Lysll4  are  in  two 
alternate  conformations  in  chain  A.  However  in  chain  B  could  be  only  Ser48,  Ser67  and 
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Tyrl32  in  two  alternate  conformations  observed.  The  coordinates  of  the  BAS-1  and  the  BAS- 
2  structures  have  been  deposited  in  the  Protein  Data  Bank  access  codes  1Y87  and  lYKU, 
respectively. 


Gas  Chromatography-Mass  Spectroscopy  (GC-MS)  of  the  ligand. 

To  100  pi  of  a  BAS-1  protein  solution  (lOmg/ml)  200  pi  of  chloroform  has  been  added.  This 
two  phase  system  has  been  treated  with  ultrasonic  for  10  minutes.  Then  the  two  phase  system 
has  been  incubated  at  70  °C  for  one  hour  and  centrifuged.  The  organic  phase  was  separated 
with  a  syringe.  In  order  to  verify  the  carboxylic  group  of  the  fatty  acid  it  was  derivatived  with 
20  pi  of  BSTFA  (bis  trimethylsylil  trifluoroacetamide)  and  20pl  pyridine  and  incubated  for 
l.S.hrs.  at  65  °C.  Samples  were  evaporated  under  a  stream  of  N2  gas  to  dryness  and 
reconstituted  with  lOOpl  methylene  chloride  and  analyzed  with  GC-MS  (Scripps  center  for 
Mass  Spectrometry,  CA,  USA). 

Isothermal  Titration  Calorimetry 

Isothermal  titration  calorimetry  (ITC)  was  performed  on  a  VP-ITC  calorimeter  from  Microcal 
(Northampton,  MA).  Eight  microliters  of  fatty  acid  (myristic  acid  n-C14:0  and  palmitic  acid 
n-C16;0  purchased  by  Sigma-Aldrich  Co,  MO,  St.Luis,  USA,  12-methyltetradecanoic  acid 
anteiso-C15:0  and  13-methyltetradecanoic  acid  iso-C15:0  purchased  by  Indofine  chemical 
Co,  NJ,  USA,  palmitoleic  acid  purchased  by  Fluka)  solution  (1.6-2.6mM)  were  injected  into 
the  cell  containing  100  pM  protein  (BAS-1  or  BAS-2).  In  each  experiment  37  injections  were 
made.  All  titrations  were  performed  at  23°C.  ITC  samples  also  contained  20mM  Tris  pH  7.4 
and  either  500  mM  (BAS-2)  or  1000  mM  (BAS-1)  NaCl.  Experimental  data  were  analyzed 
using  Microcal  Origin  software  provided  by  the  ITC  manufacturer  (Microcal,  Northampton, 
MA). 


Bacterial  strains  and  growth  conditions 

Functional  analysis  was  carried  out  in  the  B.  anthracis  Sterne  strain  34F2.  Cells  were  grown 
in  LB  medium  or  Schaeffer’s  sporulation  medium  [Schaeffer,  1965  #65].  Transformation  by 
electroporation  was  carried  out  according  to  Koehler  et  al  [Koehler,  1994  #62].  Unmethylated 
DNA  was  obtained  by  passing  plasmid  constructs  into  the  darn  strain  SCSI  10  (Stratagene).  E. 
coli  DH5_  was  used  for  plasmid  construction  and  propagation.  Antibiotics  were  used  at  the 
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following  concentrations  in  E.  coli  or  B.  anthracis,  respectively:  kanamycin  30_g/ml  and 
7.5_g/ml;  chloramphenicol  10_g/ml  and  7.5_g/ml;  spectinomycin  100_g/ml  and  200_g/ml. 
Ampicilin  was  used  at  100_g/ml  for  E.  coli  only.  The  _-galactosidase  assays  were  carried  out 
as  previously  described  [Brunsing,  2005  #60;  Ferrari,  1985  #61;  Miller,  1972  #63],  Protein 
interaction  analysis  was  carried  out  essentially  as  described  by  the  Clontech  Two-hybrid 
system  manual. 


Plasmid  constructions 


The  plasmid  for  E.  coli  over  expression  and  purification  of  ORFl  18  was  obtained  by  cloning 
the  PCR  amplified  coding  sequence  using  oligonucleotides  BaORFl  185’Nde  (5’- 
GAGTGGACATATGGAAGCAACAAAACG-3A  and  BaORF  1  1 83 ’Bam  (5’- 
CTATAGGATCCAAAAATTTCAAGGTG-3’1  into  plasmid  pET28a  (Stratagene)  digested 
with  Ndel  and  Bam  HI.  Transcriptional  fiisions  to  the  E.  coli  lacZ  gene  were  constructed  in  the 
replicative  veetor  pTCVlac  [Poyart,  1997  #64].  The  promoter  region  of  ORFl  18  was 
amplified  using  oligonucleotides  pll85’Eco2  (5’- 
CTATTGAATTCATTGATAAAGTGTAG-3’1  and  pll83’Bam2  (5’- 
TAAATGGATCCTGGCTTTCTTTTAGG-3’).  The  promoter  region  of  ORF61  was  PCR 
amplified  using  oligonucleotides  pX026  1  -5’Eco  (5’- 
GTTTAGAATTCTGAAATATTTTAATAGAC-3”)  and  pX0261-3’Bam  (5’- 

CTTTTGGATCCAATCAGATATAAATTTTTC-3’).  The  fragments  were  digested  with 
EcoRI  and  BamHI  and  cloned  in  pTCVlac  similarly  digested.  Plasmid  pORICm  was  used  for 
the  construction  of  the  ORFl  18  deletion  strain  [Brunsing,  2005  #60].  A  720bp  fragment 
downstream  ORFl  18  was  PCR  amplified  using  oligonucleotides  Delta  118Kpn  (5’- 
AATAAGGIACCTTAAGTAATAAATAC-3’)  and  Delta  118Bam  (5’- 
ATATTGGAICCTAAAAAAGAAATATAAC-3’)  and  cloned  in  pORICm  at  the  Kpnl  and 
BamHI  sites.  A  860bp  fragment  upstream  of  ORFl  18  was  also  PCR  amplified  using 
oligonucleotides  Deltal  18Sal  (5’-CATAAGTCGACTCCTTAATTCCTTAAAAATC-3’')  and 
Deltall8Pst  (5 ’-TATTACTGCAGGGAAACGGCCAATAATC-3 ”)  and  cloned  in  the 
resulting  plasmid  at  the  Sall-Pstl  sites.  Finally,  a  blunt-ended  spectinomycin  cassette  was 
cloned  at  the  Hindi  site  positioned  in  between  the  two  cloned  fragments  in  the  vector 
multiple  eloning  site.  The  resulting  plasmid  was  transformed  into  strain  34F2  and  used  to 
generate  a  deletion-spectinomycin  replacement  of  ORFl  18  essentially  as  described  [Brunsing, 
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2005  #60],  The  promoter  region  of  aixA  was  PCR  amplified  using  oligonucleotides  Delta 
118Eco2  (5’-TTCCAGAATTCCACTCCTTAATTCC-3’)  and  AtxAS’Bam  (5’- 
CAAATGGATCCAGGGCATTTATATTATC-3’);  the  fragment  was  digested  with  EcoRl 
and  EcoRV  (the  latter  is  naturally  present  in  the  atxA  gene)  and  the  360bp  fragment  was 
cloned  in  pTCVlac  digested  with  EcoRl  and  Snial. 

Plasmid  pORICm  was  also  used  for  the  construction  of  the  atxA  deletion  strain.  The  atxA 
coding  region  and  upstream  sequences  were  PCR  amplified  using  oligonucleotide  Bal  ISdelta 
(5’-TTAATGAATTCTCGCATATACATTGTGAATAC-3’)  and  AtxA3’Bam  (5’- 
CAAATGGATCCAGGGCATTTATATTATC-3’)  and  cloned  in  the  EcoRl-BamHJ  sites  of 
pORICm.  The  resulting  plasmid  was  digested  with  Bell  and  EcoRV  and  the  670bp  excised 
fragment  was  replaced  by  the  spectinomycin  cassette  as  a  BamHI-HincII  fragment.  The 
resulting  plasmid  was  used  to  transform  strain  34F2  and  generate  a  deletion-replacement  of 
the  atxA  gene  essentially  as  described  [Bninsing,  2005  #60].  The  gene  encoding  ORFl  18  was 
cloned  in  the  two  hybrid  system  vector  pGBT9  and  pGAD424  (Clontech)  as  an  EcoRl-BamHl 
fragment  obtained  by  PCR  amplification  using  oligonucleotides  THSlI85’Eco  (5’- 
AATTAGAATTCGGAGGAATGGAAGCAACAAAACGATAC-3’)  and  BaORFl  183’Bam 
described  above.  The  atxA  gene  was  cloned  in  the  pGBT9  and  PGAD424  plasmids  using 
oligonucleotides  AtxA5’EcoRI  (5’-TTATAGAATTCCTAACACCGATATCCATA-3’)  and 
AtxA3’Bam  (5’-CAAATGGATCCAGGGCATTTATATTATC-3’).  An  EcoRl  linker  with 
the  sequence  (5’- 

GAATTCTTGCCGGGACCTCTTCCGGGTCCGGAACTTCCTGGACCGGAGGGAATTC- 
3’)  was  then  inserted  in  the  EcoRl  site  to  provide  flexibility  to  the  fusion  protein.  All  PCR 
reactions  were  carried  out  on  the  full  genome  of  strain  34F2  extracted  using  the  UltraClean 
Microbial  DNA  Isolation  Kit  (Mo  Bio,  Solana  Beach,  California)  or  on  purified  pX02 
plasmid  DNA  (  generously  provided  by  Philip  Hanna). 
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Figure  Legends 

Figure  1.  Predicted  pXOl,  pBC218  and  pX02  plasmids  ORFs  and  physical  map. 

A  shows  a  part  of  the  pXOl  plasmid  of  Bacillus  anthracis  and  the  directions  of  the  arrows 
indicates  the  direction  of  transcription  in  each  ORE  relative  to  all  the  other  ORE.  BAS-1 
which  is  encoded  by  pXOl-118  is  shown  as  well  as  atxA  (anthrax  toxin  activator),  cya  ( 
edema  factor  gene),  pagA  (protective  antigen  gene). 

B  shows  a  part  of  the  pBC218  plasmid  from  Bacillus  cereus  strain  G9241  [Hoffmaster,  2004 
#47]  and  the  directions  of  the  arrows  indicates  the  direction  of  transcription  in  each  ORE 
relative  to  all  the  other  ORE.  The  homologue  protein  (locus  ZP_00236329)  which  is  encoded 
by  0049  is  shown  as  well  as  atxA  (anthrax  toxin  activator). 

C  shows  a  part  of  the  pX02  plamid  of  Bacillus  anthracis  and  the  directions  of  the  arrows 
indicates  the  direction  of  transcription  in  each  ORE  relative  to  all  the  other  ORE.  BAS-2 
which  is  encoded  by  pX02-61  is  shown  as  well  as  acpA  (gene  encoding  a  positive  trans 
activator  of  capsule  synthesis),  capB  (capsule  biosynthesis  operon  B)  [Drysdale,  2004  #53]. 


Figure  2.  Stereo  view,  Ribbon  representation  and  Size  exclusion  runs  of  BAS-1  and  BAS-2. 
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A  The  BAS-1  Ca  backbone  is  shown  in  black  and  the  BAS-2  is  shown  in  red.  N-  and  C- 
termini  are  labeld  and  every  tenth  Ca  is  numbered. 

B  The  BAS-1  dimer  in  ribbon  representation  coloured  from  the  N-terminus  (blue)  to  the  C- 
terminus  (red).  The  helices  H1-H6  are  indicated. 

C  The  BAS-2  monomer  in  ribbon  representation  coloured  from  the  N-terminus  (blue)  to  the 
C-terminus  (red).  The  helices  H1-H6  are  indicated.  Figure  A  and  B  are  produced  with 
PYMOL  [DeLano,  2002  #9]. 

D  BAS-1  and  BAS-2  have  been  detected  at  280nm.  BAS-1  is  shown  in  red  and  elutes  at  10.99 
ml  and  BAS-2  is  shown  in  black  and  elutes  at  10.98  ml.  In  green  a  molecular  weight  standard 
is  shown,  the  molecular  weight  of  the  standards  are  indicated  above  the  arrows. 


Figure  3.  Sequence  alignment  of  BAS-1  with  conserved  amino  acid  sequences  and 
Electrostatic  Potential  surface  of  the  BAS-1  and  the  BAS-2  dimer. 


A  By  searching  GenBank  at  the  NCBI  with  the  amino  acid  sequence  of  BAS-1,  using  the 
program  PSI-BLAST  with  default  values,  we  identified  a  homologue  from  Bacillus  anthracis 

BAS-2  with  an  e-value  of  e"^^  and  a  homologue  from  Bacillus  cereus  ZP_00236329  / 
gi:47565287  with  an  e-value  of  The  KIAxER  domain,  which  is  highlighted  in  the  above 
alignment  with  green,  was  also  found  in  the  N-terminal  domain  of  one  Bacillus  anthracis 
sensor  histidine  kinase  (Ba.a.:  NP_844676  /  gi:30262299),  three  sensor  histidine  kinases  from 
Bacillus  cereus  (Ba.c.l:  NP_978635  /  gi42781388,  Ba.c.2:  ZP_00236689  /  gi47565649, 
Ba.c.3:  YP_083662  /  gi52143167),  and  one  sensor  histidine  kinase  from  Bacillus  thuringensis 
(Ba.th.;  I_40575  /  gi:2127280).  The  alignment  was  carried  out  using  CLUSTAL  W  [Higgins 
D.,  1994  #38].  The  secondary  structure  determined  for  BAS-1  and  BAS-2  is  shown  above  the 
alignment.  The  numbering  of  the  a  helices  is  the  same  as  in  figure  lA  and  IB.  Residues 
which  are  highlighted  yellow  are  found  in  the  hydrophobic  cavity.  Residues  which  are 
highlighted  red  are  within  the  dimer  interface.  Cysteins  are  highlighted  cyan,  only  the  two 
cystenines  of  BAS-1  and  the  homologue  from  Bacillus  cereus,  which  are  within  the  dimer 
interface  are  highlighted  magenta. 

B  The  surface  of  the  BAS-1  dimer  is  colored  by  its  electrostatic  surface  potential  at  VI 2 
KBT/e  for  positive  (blue)  or  negative  (red)  charge  potential.  Zones  El  to  E4  are  shown. 
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whereas  residues  1  to  5  are  not  shown  for  better  comparision  to  the  electrostatic  surface  of 
BAS-2. 

C  The  surface  of  the  BAS-2  dimer  is  colored  by  its  electrostatic  surface  potential  at  VI 2 
KBT/e  for  positive  (blue)  or  negative  (red)  charge  potential.  Zones  El  to  E4  are  shown.  Both 
dimers  are  oriented  in  the  same  way  and  just  one  side  of  the  dimers  has  to  be  shown,  because 
of  the  symmetry.  This  figure  was  prepared  by  SPOCK  [http://mackerel.tamu.edu/spock/, 
#44]. 

D  The  BAS-1  model  has  been  used  to  map  the  amino  acid  sequences  from  Bacillus  cereiis 
(ZP_00236329  /  gi:47565287),  the  N-terminal  domain  of  one  Bacillus  anthracis  sensor 
histidine  kinase  (Ba.c.:  NP_844676  /  gi:30262299),  three  sensor  histidine  kinases  from 
Bacillus  cereus  (Ba.c.l:  NP_978635  /  gi42781388,  Ba.c.2:  ZP_00236689  /  gi47565649, 
Ba.c. 3:  YP_083662  /  gi52 143 1 67),  and  one  sensor  histidine  kinase  from  Bacillus  thuringensis 
(Ba.th.:  I_40575  /  gi:2 127280)  onto  it,  further  the  surface  of  BAS-2  is  shown.  The  surfaces 
are  colored  by  its  electrostatic  surface  potential  at  ±12  KBT/e  for  positive  (blue)  or  negative 
(red)  charge  potential.  This  figure  was  prepared  by  SPOCK. 

Figure  4.  Side  by  side  view  of  the  BAS-1  structure  and  the  oxygen  sensor  from  Bacillus 
subtilis. 

The  BAS-1  structure  has  been  superimposed  with  the  oxygen  sensor  structure  from  Bacillus 
subtilis.  Both  are  shown  in  ribbon  presentation  and  helices  has  been  colored  in  the  same  way. 
Helices  4  and  5  show  the  space  difference  in  both  structures  very  well.  In  the  BAS-1  structure 
the  undecanoic  acid  is  shown  and  in  the  oxygen  sensor  from  Bacillus  subtilis  the  haemin  is 
shown. 

Figure  5.  Cavity  grid  presentation  of  BAS-1  and  BAS-2  and  Stereo  view  of  the  remaining 
electron  density  in  the  cavity  of  BAS-1. 

A  shows  a  ribbon  representation  of  BAS-1  and  a  pink  grid  demonstrates  the  calculated 
hydrophobic  cavity.  Residues  (His-33,  His-35,  Val-82,  Glu-86)  which  are  at  the  entrance  are 
shown. 


B  shows  a  ribbon  representation  of  BAS-2  and  a  pink  grid  demonstrates  the  calculated 
hydrophobic  cavity.  Residues  (Arg-30,  Arg-32,  Val-79,  Glu-83)  which  are  at  the  entrance  are 
shown. 
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C  Electron  density  map  of  the  2Fobs  -  Fcaic  electron  density  map  at  sigma  level  1  is  shown.  In 
the  remaining  electron  density  map  an  undecanoic  acid  has  been  modeled.  Close  to  the 
carboxylic  group  of  the  ligand  two  sphere  shaped  electron  density  can  be  observed  which  is 
probably  a  Na^  and  a  Cf  ion.  As  red  spheres  are  Wland  W2  presented,  as  a  pink  sphere  a 
chloride  ion  and  a  blue  sphere  indicates  a  sodium  ion. 

D  putative  active  site  of  BAS- 1. 

Residues  are  shown  in  ball  and  sticks,  the  KlAxR  domain  residues  are  shown  as  well  residues 
His-30,  Asp-33,  Tyr-35,  Glu-38,  Asn-42,  Lys-69,  Ile-70,  Ala-71,  Glu-73,  Arg-74,  Asp-83, 
Phe-84,  Asn-87,  waters  62,  63,  65,95,  potassium  ion  and  fatty  acid  ligand  II A,  which  are 
involved  in  hydrogen  bonding  and  salt  bridges. 

Figure  6.  supeiposition  of  the  KIAXER  motif 

The  green  colored  worm  represents  the  BAS-1  structure  and  the  yellow  colored  worm 
represents  the  BAS-2  structure.  Especially  the  residues  of  the  motif  are  shown  as  Lys-70,  Ile- 
71,  Ala-72,  Glu-74  and  Arg-75. 

Figure  7 

A  Representative  ITC  titration  for  binding  of  palmitoleic  acid  to  BAS-1.  A  2.3  mM 
plamitoleic  acid  solution  was  titrated  into  lOOpM  BAS-1.  Binding  mode  to  BAS-1  of  other 
tested  fatty  acids  (data  not  shown)  was  similar  to  that  of  palmitoleic  acid. 

B  Representative  ITC  titration  for  binding  of  13-methyltetradecanoic  acid  to  BAS-2.  A  1.6 
mM  13-methyltetradecanoic  acid  solution  was  titrated  into  lOOpM  BAS-2.  Binding  mode  of 
other  tested  fatty  acids  to  BAS-2  (data  not  shown)  was  similar  to  that  of  13- 
methyltetradecanoic  acid.  Experimental  conditions  were  as  described  in  Experimental 
Procedures. 

Table  3.  Thermodynamic  parameters  obtained  from  ITC  titrations  of  BAS-1  and  BAS-2  with 
selected  fatty  acids. 

Figure  8 

A  Transcription  analysis  of  the  atxA  promoter  in  the  ORFl  18  deletion  strain.  _-galactosidase 
assays  were  carried  out  on  B.  anthracis  cultures  grown  in  LB  supplemented  with  kanamycin 
at  7.5_g/ml.  Open  symbols:  growth  curves;  closed  symbols:  Miller  Units 
Strains  and  symbols:  34F2/pTCVlac-atxA:  -  O  -;  34F2_1 18/pTCVlac-atxA:  -  V  -. 
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B  Transcription  analysis  of  the  ORFl  18  and  ORF62  promoters  in  the  atxA  deletion  strain. 
galactosidase  assays  were  carried  out  on  B.  anthracis  cultures  grown  in  LB  medium 
containing  kanamycin  at  7.5_g/ml.  Open  symbols:  growth  curves;  closed  symbols:  Miller 
Units.  Strains  and  symbols:  34F2/pTCVlac-l  18:  -  V  34F2_atxA/pTCVlac-l  18:  -  <C>-; 
34F2/pTCVlac-62:  -  O  34F2_atxA/pTCVlac62:  -  A 
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BAS-1  MEATKRYLCLYLKESQEKFISNWKKRILVHEHDPYKNEIIKNGTHLLHV  49 

BAS-2  MEEIKCLLCRYLKERQEKFISDWKKKVIIRERDPYKEEIIKNGEHLLSA  49 

ZP_0  0236329  MEVTKRYLCLYLKESQEKFISNWKKRILVYEHDI HKEEI INNGVQLLHA  4  9 

Ba.C.l  MEVFPIDKDIKEVFCSHLKNNRHQFVENWKNKMIISDKDPFRLEAVQNGEDLLEF  55 

Ba .  C  .  2  MEMEGMEVFPIDKDIKEVFCSHLKNNRHQFVENWKNKMIISDKDPFRLEWQNGEDLLEF  60 

Ba .  a .  MEMEGMEVFPIDKDIKEVFCSHLKNNRHQFVENWKNKMIISDKDPFRLEWQNGEDLLEF  60 

Ba .  C  .  3  MEMEGMEVFPIDKDIKEVFCSHLKNNRHQFVENWKNKMIISDKDPFRLEWQNGEDLLEF  60 

Ba.th.  MEVFPIDKDIKEIFCSHLKNNRHQFVENWKNKMIISEKDPFKLEWQNGEDLLEL  55 
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Table  1  Statistics  of  SAD  data  collection,  phasing  and  refinement. 


SAD  phasing 


Model  refinement 
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Xpeak  BAS-1 

BAS-1 

BAS-2 

Wavelength  (A) 

0.9781 

0.97923 

0.97923 

Resolution  range  (A) 

50-2.5 

76.7-1.76 

62.02-1.49 

Observations 

70890 

173637 

408633 

Unique  reflections 

5888 

16461 

56717 

Completeness^  (%) 

99.8(100.0) 

99.5(95.0) 

98.9(94.5) 

Rsym"'"  (%) 

6.9(24.7) 

5.7(46.3) 

6.8(31.5) 

Rcryst'/Rfree"  (%) 

18.50/24.10 

17.70/20.90 

Protein  atoms 

1408 

2628 

Water  molecules 

95 

364 

Ligand  atoms 

1 

34 

Ligand  molecules 

1 

R.M.S.  deviations  [Engh, 

1991  #36] 

Bonds  (A) 

0.027 

0.012 

Angles  (°) 

1.697 

1.367 

Average  B-factor 

Protein  (A^) 

Main  chain 

25.00 

18.90 

Side  chain 

32.65 

23.95 

Water  (A^) 

34.62 

35.90 

Ligands  (A^) 

37.417 

39.66 

'Number  in  parentheses  is  for  highest  resolution  shell. 

Rsym—^l  lh-<lh>|/2lh 

,  where  <lh>  is  the  average 

intensity  over  symmetry  equivalent  reflection.  ^R-factor 

1^"  obs"F  calcl  /2F  obs 

,  where  summation  is  over 

the  data  used  for  refinement.  ‘'Rfree  was  calculated  using  5%  of  data  excluded  from  refinement. 
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Table2  residues  which  line  out  the  hydrophobic  cavity  of  BAS-1  and  BAS-2. 

Hydrophobic  cavity  residues  from 

Hydrophobic  cavity  residues  from 

BAS-1 

BAS-2 

Phe-19,  Phe-50,  Phe-84,  Phe-119, 

Phe-19,  Phe-50,  Phe-84,  Phe-119,  Phe- 

Phe-120 

120 

lle-20,  lle-39.  lle-70.  lle-81,  lle-95 

lle-20,  lle-28,  lle-29,  lle-39,  lle-70,  lle-81. 

lle-95 

Trp-23 

Trp-23 

Leu-28,  Leu-46,  Leu-47,  Leu-123 

Leu-47,  Leu-123 

Val-29,  Val-79 

Val-79 

Pro-34 

Pro-34 

Asp-33,  Asp-83 

Asp-33 

Ala-77 

Ala-77,  Ala-91 

Glu-38,  Glu-73 

Glu-38,  Glu-73,  Glu-83 

Arg-74 

Arg-30,  Arg-32,  Arg-74 

Asn-42,  Asn-87 

Asn-42,  Asn-87 

Gly-43,  Gly-91 

Gly-43 

Thr-88 

Thr-88 

His-30,  His-32 

Tyr-35 

Tyr-35 
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Table  3.  Affinities  of  pOX2-61  and  pOX1-118  to  selected  fatty  acids. 


Protein 

Ligand 

Kd  (pM) 

pOX2-61 

Myristic  acid 

40  ±  15 

POX2-61 

Palmitic  acid 

41  ±  17 

POX2-61 

12-Methyltetradecanoic  acid 

41  ±  19 

POX2-61 

1 3-Methyltetradecanoic  acid 

46  ±6 

POX2-61 

Palmitoleic  acid 

20  ±  1 

pOX1-118 

Myristic  acid 

25  ±9 

pOXI-118 

Palmitic  acid 

24  ±7 

pOXI-118 

12-Methyltetradecanoic  acid 

14±3 

pOXI-118 

1 3-Methyltetradecanoic  acid 

13±3 

pOXI-118 

Palmitoleic  acid 

30  ±8 
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ABSTRACT 

Anthrax  disease  is  caused  by  a  bacterium  Bacillus  anthracis.  Its  virulence  has  been  associated  with  two 
plasmids,  pXOl  and  pX02.  Using  a  combination  of  advanced  bioinformatics  tools,  including  context 
analysis,  distant  homology  and  fold  recognition,  we  have  re-annotated  the  predicted  open  reading  frames  on 
the  pXOl  plasmid,  most  of  which  were  described  as  proteins  of  unknown  fimction  in  previous  analyses. 
Thanks  to  improved  annotation  tools  we  significantly  enhanced  the  annotation  of  the  pXOl  plasmid, 
bringing  the  total  number  of  ORFs  with  some  level  of  functional  annotation  from  48  to  over  100.  The  new 
results  also  clearly  show  the  mosaic  nature  of  pXOl  and  give  tantalizing  hints  about  the  origin  of  anthrax 
virulence.  The  highlights  of  the  new  finding  are  two  type  IV  secretion  system-like  clusters  present  on  the 
pathogenicity  island  of  the  pXOl  plasmid,  as  well  as  at  least  three  clusters  related  to  DNA  processing. 

Supplemental  material  available  online  at  httn://bioinformalic.s.burnham.org/pXQl . 


INTRODUCTION 


Anthrax  is  a  disease  primarily  affecting  herbivores  but  also  sporadically  attacking  other  mammals, 
including  humans.  Anthrax  is  known  since  antiquity  and  the  quest  for  an  effective  treatment  of  anthrax  is 
closely  related  to  the  birth  of  modem  microbiology  (Pasteur  1881).  More  recent  work  concentrated  mostly 
on  the  anthrax  toxin,  leading  to  extensive  stmctural  and  functional  analysis  of  its  components  (for  a  review 
see  (Turnbull  2002)).  However,  until  recently  the  general  level  of  interest  in  anthrax  was  limited,  since  it  is 
not  a  major  threat  to  human  health.  An  era  of  more  intensive  work  on  B.anthracis  has  started  since  anthrax 
was  adopted  by  military  as  a  biological  weapon,  resulting  in  a  threat  of  large  scale  anthrax  outbreaks.  These 
threats  were  kept  alive  by  several  large  scale  incidents,  and  more  recently  the  threat  of  anthrax  as  a  bioterror 
weapon. 

At  the  same  time,  the  origin  and  mechanism  of  B.anthracis  vimlence  are  very  interesting  on  their  own. 
Only  very  few  B.anthracis  virulence  related  proteins  were  studied  in  detail,  among  them  the  toxins  (PagA, 
LEF,  CyaA),  cell  envelope  and  germination  genes  (Cap,  S-layer  and  Ger  proteins),  and  the  regulatory 
mechanisms  triggering  the  virulence  (Fouet  and  Mesnage  2002;  Lacy  and  Collier  2002)  and  citations 
therein).  The  sequencing  of  the  B.anthracis  genome  (Okinaka  et  al.  1999;  Pannucci  et  al.  2002;  Read  et  al. 
2003)  especially  in  the  context  of  other  Bacilli  genome  projects,  highlighted  the  complex  and  little 
understood  mechanism  of  its  vimlence  (Koehler  2002).  The  B.anthracis  genome  consists  of  a  single 
chromosome  and  two  vimlence  associated  megaplasmids,  pXOl  and  pX02  (Okinaka  et  al.  1999;  Read  et  al. 
2003).  The  two  plasmids  together  convey  the  pathogenic  phenotype  and  are  responsible  for  most  of  the 
difference  between  B.anthracis  and  its  relatives  with  different  pathogenicity  profiles,  such  as  B.cereus  or 
B.thuringiensis.  However,  little  is  known  about  most  proteins  encoded  by  the  two  plasmids  and  only  a  few 
have  been  studied  by  experiment  and  shown  to  be  directly  involved  in  vimlence.  Most  pXOl  and  pX02 
proteins  have  no  obvious  sequence  similarity  to  any  other  known  genes.  Therefore,  the  interest  in 
B.anthracis  pathogenicity  transcends  its  immediate  applications  in  bioterrorism  and  human  health,  and  bears 
on  fundamental  questions  of  how  novel  and  complex  lifestyles,  such  as  pathogenicity,  can  evolve. 


Several  earlier  works  focused  on  bioinformatic  analysis  of  the  anthrax  genome  and  plasmids,  often  in  the 
context  of  related  organisms  (Ariel  et  al.  2002  2002;  Ariel  et  al.  2003;  Rasko  et  al.  2004).  These  studies 
confinued  close  relations  between  B.anthracis,  B.tfniringiensis  and  B.cereus,  and  identified  previously 
unknown  features  of  the  virulence  related  plasmids,  pXOl,  pBtoxis  and  pBc  10987,  respectively.  However,  a 
vast  majority  of  pXOl  genes  remain  uncharacterized,  both  in  terms  of  their  function  and  origin.  A  possible 
reason  for  this  apparent  novelty  of  pXOl  genes  is  that  pathogenic  plasmid  encoded  genes  evolve  rapidly  and 
often  bear  little  sequence  similarity  to  their  homologs  from  other  species,  hampering  the  detection  of 
homology  with  most  tools  of  sequence  analysis.  In  this  study  we  take  advantage  of  recent  improvements  in 
super-sensitive  tools  for  distant  homology  recognition.  These  include  a  profile  based  variant  of  the  BLAST 
algorithm  (Altschul  et  al.  1997),  algorithms  based  on  Hidden  Markov  Models  (Bateman  et  al.  2002),  and 
profile-profile  based  methods  (Rychlewski  et  al.  2000).  These  algorithms  are  most  often  tested  in  the  context 
of  stmctural  and  fold  predictions  (Kinch  et  al.  2003),  where  predictions  can  be  easily  validated  by 
comparing  three  dimensional  structures.  They  are  gaining  acceptance  also  in  function  prediction  and 
evolutionary  analysis  (Altschul  and  Koonin  1998;  Sadreyev  et  al.  2003).  In  addition,  context  analysis,  which 
takes  advantage  of  the  operon  staicture,  has  emerged  as  a  powerful  tool  of  annotation  in  prokaryotes 
(Overbeek,  et  al.  1999;  Huynen,  et  al.  2000;  Wolf,  et  al.  2001),  and  we  have  combined  these  results  with 
those  of  distant  homology  to  improve  annotation  of  the  pXOl  plasmid. 

The  origin  of  pathogenicity  plasmids  has  often  proved  elusive,  all  the  more  that  most  of  their  ORFs  were 
not  annotated.  Our  annotation  also  allows  us  to  put  forward  hypotheses  on  the  evolutionary  origin  of  the 
ORFs  encoded  in  pXOl,  which  represent  an  interesting  mix  of  vertical  and  horizontal  transfer.  Thus  we  are 
able  to  shed  new  light  on  the  evolution  of  pathogenicity  in  the  Bacillus  genus. 


RESULTS 


Overview  of  the  results 

The  results  of  our  annotation  effort  are  summarized  in  figure  1 .  All  details  are  available  as  supplementary 
material  tables  on  http://bioinlbrmatics.burnham.0rg/p.XOl .  Despite  previous  reports,  these  results  show  that 
many  pXOl  proteins  do  have  recognizable  homologues  in  other  species.  Overall,  over  60  ORFs,  previously 
described  as  unique,  could  be  reliably  identified  as  members  of  known  protein  families.  Still,  for  many  of 
them  we  are  not  able  to  confidently  assign  a  molecular  fiinction.  First,  the  frill  flmctional  groups  (operons, 
pathways)  of  many  of  the  newly  characterized  proteins  seem  to  be  missing  in  pXOl.  These  groups  may  be 
completed  by  other  proteins  from  anthrax  plasmids  or  genome  which  are  as  yet  uncharacterized,  or  the 
protein  may  have  acquired  a  different  functional  context  in  anthrax.  Second,  many  ORFs  appear  truncated 
and  mutated  to  the  point  that  it  is  unclear  whether  they  have  conserved  the  same  fiinction,  or,  in  fact, 
whether  they  have  any  function  at  all  (Supplementary  data).  This  in  turn  might  be  related  to  the  continuing 
evolution  of  the  plasmid,  where  some  genes  are  only  partly  degraded  and  still  recognizable,  like  the  region 
homologous  to  a  part  of  the  lethal  factor  (see:  Particular  cases  section  in  Results)  or  a  fragment  of  the 
NADH  dehydrogenase  (see:  Supplementary  data). 

Despite  these  reservations,  interesting  tendencies  emerge  from  our  functional  annotations:  pXOl  contains 
many  regulatory  proteins,  such  as  SinR  (BXA0020,  pXOl-14),  AtxA  (BXA0146,  pXOl-119)  or  the  MerR 
homologue  (BXA0069,  pXOl-47),  with  predicted  DNA  binding  domains.  Another  interesting  trend  is  that 
pXOl  has  a  significant  number  (15%  of  the  whole  plasmid)  of  proteins  related  to  DNA  metabolism 
(Supplementaiy  data).  We  have  also  identified  several  probable  operons,  conserved  among  different  groups 


of  bacteria. 


DNA  level  analysis 

Several  analyses  were  performed  in  order  to  analyze  the  DNA  sequence  of  the  pXO!  plasmid  [Okinaka, 
19999;  Read,  2002;  Pannucci,  2002],  The  ORF  prediction  programs  were  used,  the  DNA  motifs  were 
discovered  and  a  connection  between  promoter  elements  and  ORFs  was  already  done.  Our  analysis  of  the 
DNA  sequence  focused  on  two  aspects.  First,  we  were  interested  in  the  discovery  of  the  origin  of  replication 
since  no  genes  obviously  involved  in  this  process  could  be  detected.  Second,  we  searched  for  specific  DNA 
regions  related  to  pathogenicity. 

Our  goal  was  to  find  proteins  directly  involved  in  the  plasmid  replication.  Unfortunately,  we  could 
not  detect  those.  Therefore,  we  used  the  Oriloc  program  to  predict  the  bacterial  origin  of  replication  [Frank, 
2000].  In  bacteria,  the  leading  strands  for  replication  are  enriched  in  keto  (G,  T)  basis  while  the  lagging 
strand  is  enriched  in  amino  bases  (A,  C)  [Rocha,  1999].  This  compositional  assymetry  allows  the 
identification  of  probable  origin  and  termination  sites  of  replication.  Oriloc  analysis  indicated  a  potential 
origin  of  replication  between  bases  66538  to  66558  which  is  quite  close  to  the  origin  predicted  earlier  by 
Bemy  and  colleagues  (60955-62192  region)[Berry,  2002].  The  origin  is  predicted  in  the  neighbourhood  of 
hypothetical  proteins,  with  no  recognizable  homology  to  proteins  from  publiely  available  databases.  It  is 
located  in  between  ORFs  BXA0076  (pXOl-51)  and  BXA0077  (pXOl-52).  The  termination  of  replication 
may  lie  around  the  position  173914  on  the  pXOl  plasmid,  between  genes  BXA0206  (pXOl-137)  and 
BXA0207  (pXOl-138)  which  encode  an  RNA-binding  Hfq  (Host  Factor  I)  protein  and  the  transcription 
regidator  from  the  ArsR  family,  respectively. 

At  the  DNA  level,  we  were  interested  in  finding  regions  connected  to  the  regulation  of  virulence.  We 
focused  on  genes  regulated  by  AtxA  [Bourgogne,  2003],  Our  goal  was  to  characterize  DNA  regions 
involved  in  AtxA  binding.  For  this  purpose,  we  collected  intergenic  sequences  preceding  the  AtxA- 
dependent  genes  (see  Table  1  in  Bourgogne,  2003]  and  analyzed  it  using  the  MEME  [Bailey,  1994]  and  the 
MITRA  [Eskin,  2002]  programs.  The  only  common  motif  that  we  could  find  was  ANGGAG  which  was 
located  in  diversified  distances  (5-600  bp)  from  the  putative  ATG  translation  start  codon.  Large  differences 


in  the  location  of  the  ANGGAG  motif  can  be  attributed  to  unrecognized  ORFs  located  upstream  from  some 
of  the  analyzed  genes,  in  the  same  operon.  Another  possibility  is  that  this  signal  is  false.  Deletion 
experiments  of  these  cis  elements  should  be  perfonned  to  check  our  hypothesis. 

Protein  level  analysis 

Proposed  operons:  function  and  evolutionary  conservation 
A  pathogenicity  operon  conserved  in  Bacilli 

BXA0091  (pXOl-65)  and  BXA0094  are  homologous  to  each  other  and  to  proteins  from  several  other 
bacilli;  Enterococcus,  Listeria,  Lactococciis,  Lactobacillus,  or  other  Bacillus  species.  Function  of  proteins 
from  this  family  is  unknown,  but  the  proteins  are  hypothesized  to  be  extracellular  (Nakai  and  Horton  1999). 
Many  members  of  this  family  have  additional  domains  on  the  C-terminus,  often  repeats  such  as  WD  or  LRR 
repeats,  associated  with  protein-protein  and  receptor-like  activities.  Not  only  in  anthrax,  but  also  in 
E.faecalis  and  B.thuringiensis,  this  gene  is  represented  by  at  least  two  copies  in  each  operon.  In  B.  anthracis, 
B.thuringiensis,  L.innocua  and  E.faecalis  the  BXA0091  homologues  colocalize  with  a  surface  layer  domain 
protein.  Interestingly,  in  species  other  than  anthrax,  these  two  proteins  often  colocalize  with  three  proteins:  a 
protein  homologous  (FFAS  score:  -10. 100)  to  a  protein  containing  the  LysM  domain  (homology  is  not  in  the 
LysM  region),  a  protein  homologous  to  the  RTX  toxin  and  related  Ca^^-binding  proteins  family  and  a 
regulatory  protein  homologous  to  positive  transcription  regulators  MGA.  The  LysM  domain  binds 
peptidoglycans  and  was  first  identified  in  bacterial  lysins  (Ponting  et  al.  1999).  Several  proteins,  such  as 
staphylococcal  IgG  binding  proteins  and  E.coli  intimins,  contain  LysM  domains.  RTX  toxins  are  pore¬ 
forming,  calcium-dependent  cytotoxins  encoded  by  various  bacterial  genomes  (Braun  and  Cossart  2000), 
and  MGA  are  important  in  streptococci  virulence  (Mclver  and  Myles  2002).  Other  proteins  from  these 
operons  in  other  organisms  are  also  predicted  to  be  extracellular  and  involved  in  pathogenesis,  in  B. 
anthracis  this  appears  to  be  a  minimal  variant  of  this  virulence  related  operon. 


A  DNA-modifying  operon  shared  with  Gram-positive  bacteria 

BXAOOlO  (pXOl-06),  BXA0013  (pXOl-08)  and  BXA0015  (pXOl-10)  form  an  operon  that  can  also  be 
found  in  two  Gram-positive  specie?,,  Xanthomonas  and  Burkholder ia  (Figure  1),  and  in  the  proteobacterial 
Pseudomonas  group.  BXAOOlO  and  BXA0013  are  homologues  of  the  Xanthomonas  orf8,  of  a  Burkholderia 
protein  and  of  a  number  of  Pseudomonas  proteins.  Both  BXAOOlO  and  BXA0013  anthrax  proteins  belong  to 
the  superfamily  II  of  DNA/RNA  helicases,  and  BXAOOlO  seems  to  be  a  duplication  of  the  middle  part  of  the 
BXA0013  protein.  In  between  these  two  proteins,  in  B.anthracis,  there  is  an  inserted  reverse  transcriptase 
(BXAOOl  1,  pXOl-07).  One  can  hypothesize  that  this  insertion  occurred  after  the  duplication  and  dismpted 
BXAOOlO.  BXAOOl 3  forms  an  operon  with  BXAOOl 5,  a  protein  with  strong  similarity  to  the  N-terminal 
part  of  its  homologues  that  encodes  the  coenzyme-binding  domain  of  various  DNA  methyltransferases.  The 
co-occurence  of  the  DNA/RNA  helicase  and  DNA  methyltransferase  is  also  conserved  as  an  operon  in  other 
species  mentioned  above.  Xanthomonas,  Burkholderia  and  Pseudomonas,  but  not  anthrax,  preserve 
numerous  other  proteins  in  BXA0013-BXA0015  analogous  operons.  The  function  of  these  additional 
proteins  is  however  unclear.  From  the  functions  of  known  members  of  this  operon  one  can  imply  its  DNA 
modifying  function. 

A  nucleotide  metabolism  operon  shared  with  Actinobacteria  and  Cyanobacteria 

BXA0032  and  BXA0033  (pXOl-22),  if  fused,  would  belong  to  the  COG0175  family,  members  of  the  3'- 
phosphoadenosine  5'-phosphosulfate  sulfotransferase  (PAPS  reductase)/FAD  synthetase  group  of  enzymes 
which  are  linked  to  ATPase  involved  in  DNA  repair/chromosome  segregation  from  Anabaena  spp.,  Nostoc 
spp.,  Bacillus  stearothermophilus  and  Streptomyces  avermitilis.  Functions  of  other  proteins  from  this  cluster 
are  unknown.  In  B.anthracis  however,  it  is  located  close  to  BXA0034.  We  described  the  members  of  this 
family  as  a  new  HEPN  nucleotide-binding  domain  (Grynberg  et  al.  2003),  and  a  connection  with  BXA0037 
(pXOl-24),  a  nucleotidyltransferase  domain  protein,  is  obvious.  As  a  complex  they  may  catalyze  the 
addition  of  a  nucleotidyl  group  to  unknown  substrates,  maybe  to  antibiotics  or  other  poisonous  substances. 


as  their  structural  homolog  kanamycin  nucleotidyltransferase  does  (Matsumura  et  al.  1984).  The  specific 
fiinction  of  the  HEPN-nucleotidyltransferase  operon  in  pXOl  is  unknown. 

Type  IV  secretion  system  machinery:  two  operons  and  missing  links 

Two  operons  in  B.anthracis  contain  proteins  strongly  resembling  elements  of  type  IV  secretion  system 
proteins  (Fig.  2).  This  specific  secretion  system  is  important  in  the  delivery  of  effector  molecules  to  the  host 
cell  (Christie  2001;  Christie  and  Vogel  2000). 

The  first  operon  consists  of  four  proteins  (BXA0083/pX01-57,  BXA0085/pX01-59,  BXA0086/pX01-60 
and  BXA0087/pX01-61),  of  which  the  first  is  homologous  to  a  protein  involved  in  type  IV  pili  biogenesis, 
CpaB/RcpC  (COG3745).  The  next  protein,  BXA0085,  belongs  to  the  VirBl  1  family,  and  the  remaining  two 
are  two  paralogs  belonging  to  the  TadC  family  (COG2064),  whose  members  are  often  found  in  the  same 
operons  with  the  VirBl  1.  VirBl  1  family  is  well  studied,  (Christie  2001;  Dang  et  al.  1999;  Krause  et  al. 
2000;  Sawides  et  al.  2003;  Yeo  et  al.  2000)  and  members  of  this  family  are  ATPases  that  function  as 
chaperones  reminiscent  of  the  GroEL  family  for  translocating  unfolded  proteins  across  the  cytoplasmic 
membrane  (Christie  2001).  Homologues  of  all  four  proteins  from  the  pili  biogenesis-like  operon  form 
operons  in  many  Gram-negative  bacterial  species  (Kachlany  et  al.  2000;  Skerker  and  Shapiro  2000).  To 
date,  only  in  Caulobacter  crescentus  this  operon  was  experimentally  proven  to  be  required  for  pilus 
assembly  (Skerker  and  Shapiro  2000).  Distant  homologs  of  pilA  and  other  pilin  subunits  necessary  for  pilus 
formation  can  be  found  scattered  on  pXOl  (for  instance  BXA0092)  and  on  pX02  (work  in  preparation). 

The  second  operon  contains  the  homologue  of  the  VirB4  protein  (BXA0107)  and  a  fusion  of  the  VirB6 
homology  region  with  a  surface-located  repetitive  sequence,  similar  to  coiled-coil  proteins,  with  a  methyl- 
accepting  chemotaxis  protein  (MCP)  signaling  domain  at  the  C  terminus  (BXA0108,  pXOl-79).  VirB4 
family  is  one  of  the  elements  of  the  t3qDe  IV  secretion  system.  This  system,  ancestrally  related  to  the 
conjugation  machinery,  is  able  to  deliver  DNA  molecules  as  well  as  proteins.  VirB4  is  an  ATPase  that 
“might  transduce  information,  possibly  in  the  form  of  ATP-induced  conformational  changes,  across  the 


cytoplasmic  membrane  to  extracytoplasmic  subunits,”  according  to  Christie  (Christie  2001)  and  Dang  (Dang 
et  al.  1999).  It  contains  the  Walker  A  motif  responsible  for  ATP  binding,  which  is  well  conserved  in 
BXA0107  (200-207  fragment:  GISGSGKS).  The  BXA0108  protein  has  at  least  7  predicted  N-terminal  (55- 
281  aa)  transmembrane  motifs,  similar  to  the  central  part  of  the  VirB6  protein,  and  a  surface-located 
repetitive  sequence,  most  probably  forming  a  coiled-coil  stnicture.  The  C-terminal  of  this  protein  is 
homologous  to  a  domain  that  is  thought  to  transduce  the  external  chemotaxis  signal  to  the  two-component 
histidine  kinase  CheA  (for  review  see  (Stock  et  al.  2002)).  The  next  protein  in  this  operon  resembles  the  C- 
terminus  of  a  Bacillus  firmtis  integral  membrane  protein,  which  includes  transmembrane  domains  in  the  N- 
terminal  part.  This  region  is  homologous  to  the  phosphatidate  cytidylyltransferase  (EC  2.7.7.41),  an  enzyme 
that  catalyzes  the  synthesis  of  CDP-diglyceride,  the  source  of  phospholipids  in  all  organisms  (Icho  et  al. 
1985;  Sparrow  and  Raetz  1985).  The  ftinction  of  the  C-terminal  part  of  the  B.firmus  protein  is  unknown. 

The  presence  of  three  proteins  with  features  characteristic  of  type  IV  secretion  system  and  other  ORFs 
related  to  type  IV  pilus  formation  strongly  suggests  that  such  a  system  may  be  active  on  the  virulence 
plasmids  in  anthrax  and  may  play  a  role  in  its  vimlence.  It  seems  logical  then  to  search  for  other  elements  of 
type  IV  secretion  system  in  the  anthrax  plasmids  or  genome.  We  are  able  to  detect  some  other  distantly 
related  elements  of  this  machinery,  but  the  system  appears  incomplete.  Is  it  a  fully  functional,  minimal  type 
IV  secretion  system?  Or  are  other  parts  of  this  system  present  in  anthrax,  but  impossible  to  identify  with 
available  tools?  The  operons  discussed  here  are  good  targets  for  experimental  analysis,  since  they  contain 
many  as  yet  uncharacterized  proteins.  It  is  also  not  clear  what  molecules  are  secreted  by  this  system,  the 
anthrax  toxin  or  other  proteins.  In  any  case,  understanding  of  the  function  of  this  secretion  system  would  be 
crucial  for  our  understanding  of  diverse  roles  of  pXOl  in  vimlence. 

Putative  pX01  regulator  proteins 

The  most  important  elements  in  the  description  of  unknown  biological  systems  are  the  regulatory 
proteins.  They  decide  when,  who  and  how  is  expressed  in  the  cell.  In  pathogenic  systems,  frequently 


regulators  of  vimlence  genes  are  located  in  pathogenic  regions.  However,  various  permutations  are  known, 
where  regulators  regulate  genes  outside  of  the  pathogenicity  island,  or  regulators  encoded  outside  of  the 
pathogenicity  island  regulate  genes  located  in  the  vimlence  regions  (Hacker  and  Kaper  2000;  Hentschel  and 
Hacker  2001).  Anthrax  pXOl  plasmid  contains  many  uncharacterized  regulatory  proteins.  We  think  that  it  is 
essential  to  describe  the  regulators  on  the  anthrax  pathogenicity  vector  in  order  to  decipher  the  physiology  of 
pXOl. 

Specific  duplications  in  the  ArsR/SmtB  family:  BXA0166  and  BXA0207 

Both  BXA0166  (pXOl-109)  and  BXA0207  (pXOl-138)  are  members  of  the  ArsR/SmtB  family  of 
metalloregulatory  transcriptional  regulators.  The  vast  majority  of  known  family  members  are  repressors. 
Indeed,  BXA0166  has  been  characterized  as  the  gene  for  repressor  PagR  (Hoffmaster  and  Koehler  1999). 
They  act  on  operons  linked  to  stress-inducing  concentrations  of  diverse  heavy  metal  ions.  Derepression 
results  from  direct  binding  of  metal  ions  by  ArsR/SmtB  transcription  regulators.  The  founding  members  of 
the  family  are  SmtB,  the  Zn(II)-responsive  repressor  from  Synecchococcus  PCC  7942  (Morby  et  al.  1993), 
and  ArsR,  that  acts  as  the  arsenic/antimony-responsive  repressor  of  the  ars  operon  in  Escherichia  coli  (Wu 
and  Rosen  1991).  Another,  less  well  studied,  group  in  the  ArsR/SmtB  family  are  the  transcriptional 
activators,  with  Vibrio  cholerae  HlyU  as  the  founding  member  (Williams  et  al.  1993).  HlyU  is  known  to 
upregulate  the  expression  of  hemolysin  and  of  two  hep  genes,  which  are  coregulated  with  hemolysin 
(Williams  et  al.  1996).  We  have  conducted  a  phylogenetic  analysis  of  this  vast  family,  with  a  focus  on  the 
evolutionary  history  of  ArsR/SmtB  proteins  in  bacilli,  notably  in  anthrax,  and  on  the  relation  between 
phylogeny  and  function  (i.e.  repressor  or  activator). 

In  a  phylogeny  of  representative  members  of  the  ArsR/SmtB  family  (Fig.  5 A),  the  two  pXOl 
proteins  are  closely  grouped  with  other  Bacillus  proteins.  This  group  has  very  long  branches  in  the  tree, 
indicative  of  rapid  evolution  of  the  proteins.  The  only  two  known  activators  (HlyU  and  NolR)  of  the  family 
appear  closely  related,  in  a  clade  with  proteins  of  unknown  function.  These  latter  include  clear  orthologs  of 


HlyU  or  of  NolR.  It  is  thus  reasonable  to  predict  that  these  proteins  form  a  clade  of  transcriptional  activators. 
Interestingly,  this  "activator"  clade  appears  closely  related  to  the  elade  including  both  pXOl  proteins  (clades 
boxed  in  Fig.  5A).  PagR  is  known  to  aet  as  a  repressor,  but  in  a  weak  manner  (Hoffmaster  and  Koehler 
1999)  and  is  suspected  of  having  an  activation  flinction  as  well  (Mignot  et  al.  2003).  A  more  detailed 
phylogeny  of  close  homologues  of  the  pXOl  proteins  (Fig.  5B)  shows  that  there  has  been  a  wave  of  gene 
duplications  in  the  ancestor  of  B.antracis  and  B.cereus  (Hill  eireles  in  Fig.  5B).  All  seven  of  the  resulting 
paralogues  were  retained  in  B.antracis,  including  the  two  which  were  transfeiTed  to  pXOl,  while  four  were 
seeondarily  lost  in  B.cereus.  There  was  an  independent  duplieation  in  B.thiiringiensis  (open  circle  in  Fig. 
5B).  Interestingly,  these  are  the  only  bacilli  represented  in  this  clade  of  elose  homologues,  all  three  have 
duplications  of  the  gene,  and  all  three  are  pathogens. 

Overall,  the  phylogenetic  analysis  shows  that  both  pXOl  ArsR/SmtB  proteins  are  closely  related 
members  of  a  elade  of  fast  evolving  proteins,  whieh  have  duplieated  several  times  in  pathogenic  bacilli,  and 
which  are  related  to  the  only  elade  of  transcriptional  activators  of  the  family. 

Other  putative  regulators 

BXA0020  (pXOl-14)  is  564  amino  acids  long.  The  C-terminal  60-70  aa  are  homologous  to  DNA-binding 
domains  of  several  repressor  families  (SCOP:  a.35.1  superfamily  of  lambda  repressor-like  DNA-binding 
domains).  The  one  that  is  the  most  similar  is  the  SinR  repressor  domain  (Gaur  et  al.  1986).  In  Bacillus 
subtilis  the  proteins  of  the  sin  (sporulation  inhibition)  region  form  a  component  of  an  elaborate  molecular 
circuitry  that  regulates  the  commitment  to  sporulation.  SinR  is  a  tetrameric  repressor  protein  that  binds  to 
the  promoters  of  genes  essential  for  entry  into  sporulation  and  prevents  their  transeription  (Mandic-Mulec  et 
al.  1995;  Mandic-Mulee  et  al.  1992).  In  pXOl,  BXA0020  does  not  form  an  operon  with  sin  genes.  Instead,  it 
is  loeated  elose  to  a  protein  (BXA0019,  pXOl-13)  that  is  eharacterized  as  similar  to  the  middle  fragment 
(417-1236  aa)  of  the  236  kDa  rhoptry  protein  from  Plasmodium  yoelii  yoelii,  involved  directly  in  the 
parasite  attaek  of  red  blood  cells  (Khan  et  al.  2001).  It  is  not  certain  whether  they  form  one  operon  since 


both  genes  have  putative  independent  ribosome  binding  sites.  The  N-terminal  region  of  BXA0020  is  not 
well  described  and  has  the  strongest  similarity  to  the  a-helical  part  of  the  chromosome-associated  kinesin,  or 
the  kinesin-like  domain  (KOG0244).  Kinesins  are  microtubule-dependent  molecular  motors  that  play 
important  roles  in  intracellular  transport  of  organelles  and  in  cell  division  (Mandelkow  and  Mandelkow 
2002;  Woehlke  and  Schliwa  2000). 

The  N-temiinal  part  of  BXA0048  (pXOl-34)  is  the  DNA-binding  helix-tum-helix  motif  that  belongs  to 
the  TetR  family  (PF00440).  Members  of  this  family  take  part  in  the  regulation  of  numerous 
pathways/operons,  e.g.  TetR  is  a  tetracycline  inducible  repressor  (Hillen  and  Berens  1994),  Betl,  a  repressor 
of  the  osmoregulatoiy  choline-glycine  betaine  pathway  (Lamark  et  al.  1996),  MtrR,  a  regulator  of  cell 
envelope  permeability  that  acts  as  a  repressor  of  mU-CDE-tncoA&A  and  activator  of  farAB-encoAed  efflux 
pumps  (Lee  et  al.  2003;  Lee  and  Shafer  1999).  We  were  unable  to  determine  any  reasonable  homology  to 
the  distal  part  of  BXA0048,  therefore  no  functional  hypothesis  can  be  drawn.  The  only  indication  for  the 
flinction  of  that  regulator  is  the  probable  placement  on  one  operon  with  a  nucleotidyltransferase  (BXA0047, 
pXOl-33).  The  presence  on  the  same  operon  of  the  nucleotidyltransferase  with  a  superfamily  II  DNA  and 
RNA  helicase  family  protein  in  Streptomyces  coelicolor  can  be  a  suggestion  that  BXA0048  is  involved  in 
DNA  metabolism. 

BXA0060  (pXOl-40)  belongs  to  a  large  superfamily  of  repressors  (SCOP;  a.35.1).  It  is  composed  of  the 
DNA-binding  domain  only.  Homologues  of  BXA0060  are  present  in  numerous  archaeal  and  eubacterial 
genomes,  with  no  preservation  of  operon  staicture.  It  seems  then  that  BXA0060  homologues  are  involved  in 
very  diverse  functions/pathways. 

BXA0069  (pXOl-47)  belongs  to  the  family  of  global  transcription  activators  of  membrane-bound 
multidrug  transporters,  responsible  for  bacterial  multidmg  resistance  (MDR)(Paulsen  et  al.  1996).  The 
closest  homologue  is  the  B.subtilis  MtnA  regulator  that  belongs  to  the  MerR  family  (Summers  1 992).  It  is 
known  to  activate  two  MDR  transporters  {bmr  and  bit),  a  transmembraneous  protein-coding  gene  ydfK  and 
its  own  gene  (Baranova  et  al.  1999).  It  acts  independently  from  two  specific  activators,  BmrR  and  BltR,  that 


are  encoded  by  the  bmr  and  bit  operons  (Ahmed  et  al.  1995).  MtnA  and  other  members  of  the  MerR  family 
are  composed  of  three  regions;  N-terminal  DNA-binding  domain  (winged  helix-tum-helix  motif),  middle 
all-helical  dimerization  region  and  the  C-terminal  part  specific  for  each  protein  that  is  probably  involved  in 
specific  ligand  binding  (Godsey  et  al.  2001).  BXA0069  perfectly  fits  this  description,  it  possesses  two  quite 
conserved  distal  regions,  and  a  90  amino  acid  region  of  no  homology  that  has  an  almost  80%  probability  of  a 
coiled-coil  stmcture  (Lupas  et  al.  1991).  Because  of  lack  of  resemblance  of  the  C-terminus  to  any  known 
regulatory  domain,  it  is  difficult  to  propose  in  what  metabolism/gene(s)  activation  is  the  BXA0069  protein 
involved. 

The  FFAS  analysis  revealed  low  score  similarity  of  BXA0122  (pXOl-89)  to  the  MarR  regulators  of  the 
multiple  antibiotic  resistance  locus  (Grkovic  et  al.  2002;  Seoane  and  Levy  1995).  This  regulon  consists  of 
the  marRAB  operon  and  the  marC  gene.  MarR  acts  as  a  repressor  by  binding  as  a  dimer  to  promoter  regions 
of  the  mar  regulon  (Martin  and  Rosner  1995).  The  repressive  DNA-binding  by  MarR  can  be  inhibited  by 
several  anionic  compounds,  e.g.  salicylate  (Alekshun  and  Levy  1 999). 

AtxA  is  a  proven  regulator  of  anthrax  toxin  genes  (Dai  et  al.  1995;  Koehler  et  al.  1994;  Uchida  et  al. 
1993).  It  is  also  known  to  influence  the  expression  of  other  genes  on  pXOl,  pX02  plasmids  and  the  anthrax 
genome  (Bourgogne  et  al.  2003).  AtxA  is  a  member  of  a  large,  PTS  (the  phosphoenolpyruvate-dependent, 
sugar  transporting  phosphotransferase  system)  regulatory  domain-containing  family  (Greenberg  et  al.  2002). 
Members  of  this  family  usually  have  a  duplicated  DNA/RNA  binding  domain  and  also  duplicated  PTS 
regidatory  domain.  Different  variants  of  this  structure  are  known,  and  additional  domains  are  often  present. 
Most  probably,  the  presence  of  PTS  Eli  homology  domains  is  the  necessity  to  act  as  an  activator,  since  these 
domains  are  lacking  in  antiterminators  (Greenberg  et  al.  2002).  Because  of  its  structure  (Fig.  4),  AtxA  is 
believed  to  be  a  transcriptional  activator.  Knowing  the  architecture  of  this  family,  we  searched  the  whole 
anthrax  genome  in  order  to  find  all  similar  regulators.  Among  the  ones  we  found  (Fig.  4),  apart  Ifom  the 
obvious  AtxA  and  AcpA  proteins,  there  is  a  very  recent  confirmation  of  the  regulatory  activity  of  the 
BXB0060  (pX02-53),  named  AcpB  (Drysdale  et  al.  2004).  Diversity  of  domain  composition  and  subtle 


stmctural  differences  in  the  group  of  evolutionary  related  anthrax  regulators  are  certainly  elements  of  a  very 
fine  regulation  of  stages  of  infection. 

BXA0178  (pXOl-105)  belongs  to  the  AbrB  family  of  “transition  state  regulators.”  AbrB  was  first 
described  in  Bacillus  subtilis  as  an  activator  and  repressor  of  numerous  genes  during  transitions  in  growth 
phase  (Phillips  and  Strauch  2002).  Recently,  Saile  and  Koehler  (Saile  and  Koehler  2002)  showed  that  the 
genomic  copy  of  AbrB  in  B.anthracis  regulates  the  expression  of  three  toxin  genes,  whereas  the  tmncated 
pXOl  version  (BXA0178)  of  AbrB  does  not  affect  toxin  gene  expression.  We  can  speculate  then  that  the 
truncation  could  be  crucial  for  BXA0178  fLinction,  or  its  influence  on  pXOl  flmction  is  not  yet  understood. 

According  to  FFAS  analysis,  BXA0180  is  an  N-tenninal  part  of  the  lambda  repressor-like  DNA-binding 
domain  superfamily  (a.35.1),  as  classified  by  the  SCOP  database  (Andreeva  et  al.  2004).  The  ORF  is 
truncated  after  the  first  half,  and  experiments  are  needed  to  check  whether  a  shortened  domain  can  exert  any 
function. 

BXA0206  (pXOl-137)  belongs  to  a  large  family  of  Hfq  proteins.  Members  of  this  family  are  known  to 
be  involved  in  various  metabolic  processes,  like  the  regulation  of  iron  metabolism  (Masse  and  Gottesman 
2002;  Wachi  et  al.  1999),  mRNA  stability  (V34vytska  et  al.  1998),  stabilization  and  degradation  of  RNAs 
(Takada  et  al.  1999;  Tsui  et  al.  1997).  Hfq  proteins  are  similar  to  eukaryotic  Sm  proteins  involved  in  RNA 
splicing  (Moller  et  al.  2002).  The  function  of  the  pXOl  version  is  not  known  and  the  RNA  targeted  by 
BXA0206  is  not  recognized.  The  question  remains  whether  BXA0206  acts  on  an  RNA  encoded  by  the 
plasmid  itself  or  has  another  function,  e.g.  acts  on  a  chromosomal  small  RNA  or  disguises  as  the  human  Sm 
protein. 

Interesting  ORFs  from  the  "pathogenic"  region 

The  “pathogenic”  region  is  defined  as  extending  from  BXA0057  to  BXA0191  (Okinaka  et  al.  1999; 
Sirard  et  al.  2000),  and  is  obviously  of  special  interest. 


BXA0139:  an  ORF  implicated  in  Hemolysis? 

The  BXA0139  (pXOl-124)  protein  is  located  close  to  the  oedema  factor  (CyaA)  on  the  pXOl  sequence. 
It  is  150  amino  acids  long,  located  on  an  operon  with  two  unknown  hypothetical  proteins,  BXA0138  (pXOl- 
125)  and  BXA0140  (pXOl-123).  The  only  known  fact  about  these  proteins  is  the  similarity  of  BXA0138  to 
BXA0149  (pXOl-1 17)  (Supplementary  data). 

The  most  interesting  finding  is  the  homology  of  BXA0139  to  the  C-terminal  end  of  the  hemolysin  II  from 
B.cereus  (Miles  et  al.  2002).  This  homology  has  already  been  described  by  Miles  et  al.  (2002),  but  only  as  a 
similarity  to  a  46-amino  acid  segment  of  BXA0139.  In  reality,  however,  BXA0139  is  a  duplication  of  the 
same  fragment,  and  C-end  of  hemolysin  II  is  similar  to  both  the  N-  and  C-terminal  parts  of  BXA0139  (Fig. 
3).  The  significance  of  the  C-tenuinus  of  the  hemolysin  II  in  B.cereus  is  unknown,  and  the  functional  studies 
suggest  it  has  no  influence  on  the  hemolytic  activity  of  the  enzyme  (Baida  et  al.  1999;  Miles  et  al.  2002). 
Hemolysins  form  heptameric  rings  (Gouaux  et  al.  1997;  Song  et  al.  1996),  in  which  the  C-terminal  domain 
would  reside  in  the  outside  part  of  each  monomer  (Miles  et  al.  2002).  Miles  and  colleagues  (2002)  suggest 
three  possible  flinctions  for  this  domain,  however  they  do  not  exclude  other  possibilities.  Either  it  is  needed 
to  form  lattices  or  bind  to  surfaces,  or  has  some  eatalytic  activity.  We  also  hypothesize  an  auxiliary  function 
for  the  main  monomer  domain,  maybe  a  regulatory  function.  Quite  peculiar  is  the  presence  of  a  tandem  tail- 
to-head  repeat  coded  by  the  pXOl  plasmid.  It  is  not  fused  to  any  catalytic  domain  and  no  overall  function 
for  the  whole  operon  is  known.  The  most  attractive  hypothesis  would  be  the  binding  to  surfaces.  Maybe  it 
serves  as  an  anchor  to  the  host  cell  membrane  during  the  attack? 

An  interesting  finding  can  maybe  give  a  clue  to  a  real  function  of  BXA0139.  We  found  a  hemolysin  II 
homolog  in  B.anthracis  genome  (gi:  21400399)  that  is  almost  identical  to  the  B.cereus  enzyme.  However,  in 
all  anthrax  strains  sequenced,  there  is  a  nonsense  mutation  (TGG  to  TGA),  instead  of  tryptophan  372  in 
B.cereus.  In  order  to  improve  on  the  prediction  of  the  encoded  peptide,  we  ran  the  BLASTX  program  using 
the  genomic  sequence  with  large  overhangs  on  both  sides  of  the  recognized  ORF.  The  resulting  sequence  is 
given  in  the  alignment  in  Figure  3.  So,  if  the  anthrax  mutation  is  real  (and  its  existence  in  all  anthrax  strains 


seems  to  reinforce  this  notion),  we  can  hypothesize  that  BXA0139  is  auxiliary  to  the  hemolysin’s  flinction 
of  the  genomic  copy  of  hemolysin. 

Reverse  homology  of  BXA0167 

This  hypothetical  ORF  (pXOl-108)  has  no  identifiable  homologs.  Its  function  is  also  not  known.  It  is  a 
product  of  automatic  translation.  We  could  assume  then  that  it  is  not  an  interesting  target  for  analysis. 

We  performed  a  BLASTX  analysis  along  its  sequence  and  found  an  interesting  homology  coded  by  the 
opposite  strand.  Interspersed  with  nonsense  mutations,  we  found  a  strong  homology  to  the  N-terminus  of  the 
lethal  factor  (corresponding  to  9-176  amino  acids  of  LEF)(data  not  shown).  Noticeably,  this  homology 
region  is  encoded  by  the  opposite  strand  from  the  LEF  gene.  Is  it  an  example  of  a  duplication  event  covered 
up  by  other  events  that  happened  later  in  the  course  of  evolution?  Was  the  part  of  the  N-terminal  LEF 
domain  flinctional  in  the  past? 


DISCUSSION 

In  our  work  we  described  many  novel  features  of  the  pXO  1  plasmid  that  were  not  noticed  previously.  For 
instance,  we  show  that  parts  of  pXOl  are  not  only  related  to  other  bacilli  plasmids,  but  also  to  proteins  from 
more  distant  species.  One  of  the  most  unexpected  findings  was  the  realization  that  pXOl  possesses  two 
operons  with  homology  to  type  IV  secretion  and  pilus  assembly  systems.  It  is  surprising  because  the  type  IV 
system  is  found  mainly  in  Gram-negative  bacteria  (Bums  2003).  Only  some  elements  of  the  pilus  are  present 
in  some  Gram-positive  bacteria  (Grohmann  et  al.  2003;  Wall  and  Kaiser  1999).  It  is  even  more  surprising 
that  the  operons  are  not  complete.  A  tempting  hypothesis,  which  should  be  tested  experimentally,  is  that  the 
proteins  present  in  pXOl  constitute  a  minimal  set  indispensable  for  the  formation  and  function  of  the 
secretion.  Alternatively,  these  operons  may  have  drifted  from  the  original  function.  Cases  both  of  minimal 
functional  units,  and  of  drift  from  original  function,  are  known  in  pathogens  and  symbionts.  The  discovery 


of  type  IV  secretion  system  has  the  potential  for  a  significant  impact  on  our  understanding  of  anthrax 
virulence:  a  new  pathogenic  delivery  pathway  can  be  of  major  importance  in  the  invasion  process. 

The  similarity  to  other  various  bacteria  and  copying  of  parts  of  operons  shows  the  phylogenetic 
kaleidoscope  nature  of  this  megaplasmid.  Apparently,  this  killing  agent  has  developed  by  collecting 
genomic  pieces  from  a  very  broad  range  of  bacteria,  including  pathogenicity  agents  as  well  as  other 
organisms.  Some  of  these  pieces  may  be  non-fiinctional  (at  least  in  their  original  way)  or  not  related  to 
anthrax  pathogenicity.  It  is  worth  noting  that  pXOl  shares  similarity  with  other  pathogenic  bacteria  also  in 
regions  not  previously  recognized  as  a  part  of  the  pXOl  pathogenicity  island  (see  the  operon  preservation 
with  Biirkholderia  and  Xanthomonas  in  Results),  whose  status  may  have  to  be  revised. 

A  detailed  analysis  of  the  pXOl  sequence  by  Okinaka  et  al.  (Okinaka  et  al.  1999)  focused  mostly  on  the 
analysis  of  mobile  elements,  their  number  and  possible  implication  for  the  evolution  of  the  plasmid.  Our 
findings  not  only  suggest  a  thorough  history  of  transposition  but  also  allow  us  to  hypothesize  on  the 
probable  entities  that  were  used  to  build  pXOl.  Interestingly,  even  if  the  type  IV  clusters  are  located  inside 
the  putative  PAI,  one  can  guess  it  was  an  indispensable  part  of  the  plasmid  sequence,  however  the  presence 
of  the  IS  DD-E  transposases  suggest  it  is  a  new,  independent  insertion.  Another  option  would  be  that  we 
deal  with  a  conjugative  transposon,  unusually  equipped  with  a  set  of  DD-E  transposases  instead  of  Tyr  or 
Ser  recombinases.  The  important  question  to  understand  pXOl  as  a  mobile  entity  is  to  localize  the 
replication  machinery.  We  were  unable  to  find  it,  which  makes  this  even  more  intriguing,  however  we 
identified  the  putative  replication  start  and  termination  sites.  The  nature  of  replication  should  be  informative 
on  the  nature  and  provenience  of  the  pXO  1  plasmid. 

The  discovery  of  previously  unknown  systems  on  pXOl  plasmid  of  course  begs  questions  about  their 
regulation.  External  signals,  cell  state  or  host-pathogen  interaction  certainly  trigger  bacterial  response(s), 
and  several  of  them  are  already  known  (for  review  see  (Koehler  2002)).  All  these  signals  finally  activate 
transcription  of  virulence-related  genes.  We  have  attempted  to  describe  all  possible  regulators  that  we  could 
find,  using  sensitive  profile-profile  alignment  programs.  Some  of  the  regulatory  proteins  are  known  not  to 


influence  the  toxin  flinction  (e.g.  the  homologue  of  AbrB),  but  others  form  priority  targets  for  experimental 
studies  of  pathogenicity  and  B.anthracis  biology.  Notably,  do  the  newly  discovered  factors  regulate  plasmid 
genes  or  chromosome  genes? 

We  don’t  know  how  important  is  the  presence  of  a  common  motif  for  AtxA-regulated  genes.  Its  variable 
location  throughout  the  putative  promoter  regions  (closer  or  further  to  the  ATG)  poses  questions.  However, 
there  may  be  ORFs  not  yet  recognized  5’  from  the  ones  that  are  AtxA-dependent.  In  this  case,  the 
recognized  ANGGAG  sequence  would  directly  precede  the  operon.  Deletion  experiments  are  needed  to  test 
whether  these  cis  elements  have  any  impact  on  the  fiinction  of  AtxA-regulated  genes. 

Another  interesting  finding  is  the  diversity  of  ArsR  homologs  in  B.anthracis.  The  majority  of  these, 
including  those  on  pXOl,  are  related  to  the  activator  subfamily.  The  fiinctions  of  MarR  and  TetR  regulators 
are  also  intriguing. 

There  are  two  striking  features  of  the  whole  plasmid  that  brought  our  special  attention.  First,  the  presence 
of  so  many  DNA  metabolism-related  proteins  (15%)(Supplementary  data).  It  seems  that  DNA  is  a  central 
point  of  the  function  of  pXOI.  Is  this  function  related  with  the  processing  of  pXOl,  chromosomal  DNA, 
transposons,  or  host  DNA?  None  of  these  hypotheses  can  be  excluded  at  the  moment.  The  type  IV  delivery 
system  could  be  an  indication  that  some  of  them  could  have  an  external  function.  Second,  when  analyzing 
the  DNA  and  proteome  of  pXOl  we  realized  how  messy  it  is.  pXOl  is  full  of  incomplete  and  mutated  ORFs 
(see  Results  and  Supplementary  data).  There  are  many  traces  of  ancient  duplications,  some  still  fresh  (strong 
homology),  but  some  almost  completely  faded  away  (homology  barely  recognizable),  and  often  dismpted.  It 
also  consists  of  ORFs  “borrowed”  from  other  species.  pXOl  seems  to  be  the  subject  of  constant 
evolutionary  flux.  The  pXOl  plasmid  should  have  a  tag:  “under  constmction.” 


METHODS 


Gene  names 

The  pXOl  plasmid  was  sequenced  at  least  twice,  by  two  independent  research  groups.  Interestingly,  the 
two  sequences  differ  significantly,  both  on  the  DNA  and  on  the  (predicted)  protein  level.  The  second  more 
recent  sequencing  identified  almost  100  additional  genes  on  pXOl.  Several  alternative  naming  conventions 
for  B.anthracis  plasmid  proteins  are  used  in  literature.  We  use  the  names  used  by  the  pXOl  sequencing  team 
(Read  et  al.  2003)  (e.g.  BXA007)  as  our  primary  names,  but  where  appropriate  we  also  provide  the  names 
used  by  the  previous  sequencing  team  (e.g.  pXOl-04)  or  common  gene  names  used  in  the  literature  (e.g. 
AtxA)  when  available. 

DNA  level  analysis 

The  Bacillus  anthracis  strain  A2012  pXOl  plasmid  sequence  was  used  for  analysis  (accession: 
NC_003980)(Read  et  al.  2003). 

We  used  the  Oriloc  (Frank  and  Lobry  2000)  program  to  detect  pXOl  origin  of  replication,  using  the  gene 
coordinates  provided  in  pXOl  Genbank  file. 

For  the  analysis  of  common  DNA  features  in  promoter  regions  of  AtxA-dependent  genes  (Bourgogne  et 
al.  2003),  we  used  the  total  DNA  sequences  between  the  end  of  a  previous  gene  and  the  ATG 
neighbourhood  of  the  AtxA-regulated  gene.  We  used  the  5’  regions  of  the  following  genes  from  pXOl  and 
pX02  plasmids:  BXA0019  (pXOl-13),  BXA0124  (pXOl-90),  BXA0125  (pXOl-91),  BXA0137  (pXOl- 
126),  BXA0142  (cyaA),  BXA0164  (pagA),  BXA0172  (leQ,  BXB0045  (pXOl-31),  BXB0060  (pXOl-40), 
BXB0066  (pXOl-58),  BXB0074,  BXB0084  (pXOl-124).  We  used  MEME  and  MITRA  programs  to  search 
for  common  motifs  (Bailey  and  Elkan  1994;  Eskin  and  Pevzner  2002). 

Protein  level  analysis 

For  the  analysis  of  the  pXOl  proteome,  we  used  proteins  accessible  with  the  BXAxxxx  NCBI  numbers, 
enforced  with  the  BLASTX  analysis  (Altschul  et  al.  1990). 


To  analyze  the  protein  sequences,  we  used  the  following  programs:  BLAST  tools  (Altschul  et  al.  1990; 
Altschul  et  al.  1997),  SMART  tool  (Letunic  et  al.  2002),  Pfam  (Bateman  et  al.  2002),  CDD  (Marchler-Bauer 
et  al.  2003),  TMHMM2.0  (Sonnhammer  et  al.  1998),  SEED  (Read  et  al.  2003),  Radar  (Eleger  and  Holm 

2000) ,  FFAS03  (Rychlewski  et  al.  2000),  Metaserver.pl  (Ginalski  et  al.  2003),  Superfamily  (Gough  et  al. 

2001) . 

To  align  sequences  we  used:  T-COFFEE  (Notredame  et  al.  2000),  AliBee  (Nikolaev  et  al.  1997), 
MultAlin  (Corpet  1988),  BioEdit  (Hall  1999). 

Phylogenetic  trees  were  estimated  from  amino  aeid  alignments  using  PHYML  (Guindon  and  Gascuel 
2003),  a  fast  and  accurate  Maximum  Likelihood  heuristic,  under  the  JTT  substitution  model  (Jones,  Taylor 
et  al.  1 992),  with  a  gamma  distribution  of  rates  between  sites  (eight  categories,  parameter  alpha  estimated  by 
PHYML).  Bootstrap  support  of  branches  was  estimated  using  the  programs  SEQBOOT  and  CONSENSE  of 
the  PHYLIP  package  (Felsenstein  2002)  with  1000  replicates;  the  parameter  alpha  was  estimated 
independently  for  each  repetition. 
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FIGURE  LEGENDS 

Figure  1.  A  summary  of  the  distribution  of  homologs  of  the  predicted  proteins  (ORFs)  encoded  in  pXOl 
plasmid  in  a  set  of  >100  diverse  microbial  genomes.  Only  relatively  close  homologues  (with  FASTA  P- 
score  above  10-3)  were  taken  into  account  at  this  stage  of  the  analysis.  Relative  size  and  polarity  of  ORFs 
(using  the  predictions  and  the  nomenclature  by  TIGR)  on  the  linearized  map  of  pXOl  are  illustrated  by  the 
heights  (cutoff  at  500  amino  acids)  and  orientation  of  the  bars  along  the  X-axis  (panel  A,  continued  on  panel 


B).  Open  bars  correspond  to  proteins  for  which  no  homologues  have  been  detected  in  this  analysis.  Bars 
with  matching  colored  borders  correspond  to  “repeats”  present  in  pXOl.  Black  and  colored  bars  in 
correspond  to  proteins  for  which  at  least  one  homolog  was  detected  in  this  analysis. 

Panel  C  (and  its  continuation  in  panel  D)  mark  the  presence  of  respective  homologues  in  at  least  one  of 
the  representative  genomes  in  several  groups  (as  indicated  in  respective  boxes): 

Group  1 :  B.  anthracis  (chromosome  or  pX02),  B.  thuringiemis  or  B.  cerens. 

Group  2:  B.  siihtilis,  B.  halodiirans  or  B.  stearothermophiliis. 

Group  3 :  Staphylococci,  Streptococci  or  Eneterococci  species. 

Group  4:  Salmonella,  Xanthomonas  or  Burkholderia  species. 

Group  5:  Geobacter,  Anahaena  or  Nostoc  species. 

These  genomes  contain  the  largest  number  of  homologues  of  pXOl -borne  proteins,  and  jointly  they 
provide  a  nearly  complete  coverage  of  the  phylogenetic  space  of  pXOl  homologues. 

Figure  2.  Type  IV  secretion  and  pilus  systems  representations  with  homologous  genes  in  B. anthracis 
shown  in  red.  It  is  worth  noting  that  in  the  secretion  operon  representation,  the  anthrax  VirB6  gene  is  fused 
to  an  adhesin-like  long  sequence,  whereas  in  the  pilus  assembly  operon  the  last  homologue,  TadC,  has  two 
representations  in  the  anthrax  operon.  For  more  detailed  comparison  to  known  type  IV  secretion  and  pilus 
assembly  systems,  see  (Christie  2001;  Christie  and  Vogel  2000;  Kachlany  et  al.  2000;  Kachlany  et  al.  2001; 
Skerker  and  Shapiro  2000). 

Figure  3.  The  multiple  alignment  of  the  Bacillus  cereiis  terminal  hemolysin  II  domain,  two  parts  of  the 
BXA0139/pXOl-124  protein,  the  Streptococcus  phage  Cp-1  orfl6  and  the  B.anthracis  hemolysin  II  copy 
with  a  tmncated  C  terminus.  The  star  represents  the  stop  codon  in  the  anthrax  DNA  sequence. 

Figure  4.  The  domain  structure  of  the  AtxA  family  of  protein  from  B.anthracis.  Each  colour  depicts  a 
family  of  most  homologous  sequences.  Similar  colours  describe  duplicated  sequences. 


Figure  5.  Phylogenetic  trees  of  ArsR/SmtB  proteins. 

Phylogenies  estimated  using  PHYML  (Guindon  and  Gascuel  2003).  Figures  at  nodes  are  bootstrap  support 
in  %  of  1000  replicates;  bootstrap  proportions  under  50%  are  not  reported.  Branch  length  is  proportional  to 
the  estimated  number  of  substitutions  per  site.  Proteins  from  pXOl  are  boxed. 

(A)  Phylogeny  of  representative  proteins  sampling  the  diversity  of  the  ArsR/SmtB  family.  Two  B.anthracis 
proteins  with  short  sequences  are  not  included  (Q81NE6  and  Q81QQ6).  Unrooted  tree  drawn  using 
TreeView  (Page  1996);  the  measure  bar  represents  0.1  substitutions/site.  The  boxes  indicate  clades 
(monophyletic  groups)  discussed  in  the  text. 

(B)  Phylogeny  of  pXOl  ArsR/SmtB  proteins  and  close  homologues.  This  corresponds  to  the  box  "close 
homologs  of  pXOl  proteins"  in  (A),  plus  all  closely  related  homologs  as  determined  from  a  phylogeny  of  all 
available  ArsR/SmtB  sequences  (487  sequences;  tree  not  shown).  Tree  rooted  according  to  the  phylogeny  of 
all  ArsR/SmtB  proteins,  and  drawn  using  NJplot  (Jeanmougin  et  al.  1998);  the  measure  bar  represents  0.5 
substitutions/site.  Full  circles  indicate  gene  duplications  in  the  common  ancestor  of  B.antracis  and  B.cereiis; 
the  empty  circle  indicates  a  gene  duplication  in  B.thuringiensis. 
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Q7V3Y8  Prochlorococcus  marinus 
Q8YQS6  Anabaena  sp. 

Q8YQD0  Anabaena  sp. 

Q8DKT6  Synechococcus  elongatus 
'Q8KEP0  Chlorobium  tepidum. 


- Q9AIS2  Anabaena  sp. 

_I“Q9KI49  Oscillatoria  limnetica. 

Q9KI48  Aphanothece  halophytica. 


0.5 


Principal  Investigator:  Liddington,  Robert  C. 


VirFact:  a  relational  database  of  virulence  factors  and 
pathogenicity  islands  (PAIs). 


Adrian  Tkacz,  Leszek  Rychlewski  and  Adam  Godzik 


ABSTRACT 


The  VirFact  database  (http://virfact.burnham.org)  contains  information  on  microbial 
virulence  factors  and  pathogenicity  islands  (PAIs)  from  major  pathogens.  The  database 
collects  information  from  literature  and  combines  them  with  re.sults  obtained  by  genome 
context  analysis  and  distant  homology  recognition.  The  database  can  be  browsed  by  virulence 
factor,  PAI  or  organism  name.  The  annotations,  including  multiple  alignments  of  proteins 
homologous  to  virulence  factors,  genomic  context,  models  of  three  dimensional  structures  (if 
available)  are  presented  using  graphical  web  interface  and  standard  visualization  tools.  The 
VirFact  can  also  be  used  as  a  tool  to  recognize  the  presence  of  homologs  of  known  virulence 
factors  in  the  genome  delivered  by  the  user. 


INTRODUCTION 

Recent  development  of  comparative  genomic  analysis  and  experimental  molecular 
biological  techniques  made  it  possible  to  identify  specific  genes  responsible  for  virulence  of 
pathogenic  microbes.  Despite  some  discussions  (1),  it  is  widely  accepted  that  virulence  of  a 
pathogenic  microbe  is  imparted  by  a  specific  set  of  genes,  often  localized  together  on  a 
plasmid  (virulence  plasmids)  or  on  a  genome  (pathogenicity  islands).  Virulence  factors  are 
typically  identified  by  comparing  genomic  sequences  of  pathogenic  and  non-pathogenic 
strains  or  by  studying  virulence  of  deletion  mutants.  While  building  VirFact  we  adhered  to  a 


broad  definition  of  a  virulence  factor  that  includes  genes  specifically  involved  in  interactions 
between  a  pathogen  and  its  host,  but  also  genes  supporting  pathogenic  lifestyle  and  many 
genes  of  unknown  function  if  they  are  part  of  the  genomic  structure  related  to  pathogenicity. 
Virulence  factors  of  many  organisms  are  well  studied,  but  the  information  about  them  is 
usually  available  only  is  specialized  literature  and  then  usually  only  in  the  context  of  a 
specific  organism.  We  believe  that  this  scattering  of  information  makes  it  difficult  to  study 
general  questions  involving  pathogenicity,  such  as  for  instance  similarity  between  virulence 
apparatus  of  unrelated  pathogens.  At  the  same  time,  sequence  analysis  and  annotations  of 
many  virulence  related  genes  is  very  uneven  and  tools  such  as  distant  homology  analysis,  fold 
recognition  or  modeling  are  seldom  used.  The  goal  of  the  VirFact  project  is  the  development 
of  a  well  annotated  database  containing  information  about  pathogenicity  systems  from 
different  organisms  and  providing  a  uniform  level  of  annotation,  including  annotations  with 
most  sensitive  algorithms. 


THE  DATABASE 

The  VirFact  database  (http://virfact.burnham.org)  is  implemented  as  a  relational  database 
containing  a  collection  of  virulence  factors  and  pathogenicity  islands  from  major  microbial 
pathogens.  The  current  release  of  VirFact  is  divided  into  five  main  areas  (discussed  below) 
providing  different  approaches  and  views  to  data  analysis: 

•  a  collection  of  individual  virulence  factors 

•  a  collection  of  pathogenicity  islands 

•  source  genomes 

•  annotations  and  prediction  results 

•  links 

The  first  section  contains  basic  information  about  individual  viailence  factors,  such  as  their 
amino  acid  sequences,  annotations  collected  from  literature  and  links  to  other  fields  in 
database.  This  area  is  de  facto  the  core  of  the  system. 

Individual  virulence  factors  from  a  given  organisms  often  form  operon  like  structures 
called  pathogenicity  islands  (PAIs)  -  information  about  them  forms  the  next  area  of  the 
VirFact  database.  Additional  data,  such  as  a  PAI  position  at  the  genome,  its  short 
characterization  and  lists  of  genes  it  contains  is  provided  here.  Since  PAIs  usually  evolve  by 


lateral  transfer,  they  differ  by  many  features  from  the  host  genome.  To  aid  in  identifying 
novel  PAIs,  the  user  can  view  a  chart  (deposited  in  database)  showing  genomic  regions  that 
deviate  most  from  the  re,st  of  the  genome.  This  diversity  is  based  on  three  compositional 
criteria:  G+C  content,  dinucleotide  frequency  and  codon  usage  (2). 

For  individual  virulence  factors,  the  annotations  and  results  of  analysis  and  prediction  tools 
provide  information  about  homologs  and  genomic  context  and  other  information  about  a 
chosen  virulence  factor,  as  discussed  in  detail  below. 

Finally,  the  links  to  sections  described  above  and  various  addresses  that  are  useful  for  the 
user  or  necessary  for  the  service  are  listed  in  a  separate  area  of  the  website.  The  current  (July 
20,  2004)  release  of  VirFact  contains  about  400  proteins,  12  pathogenicity  islands  (PAIs)  and 
7  completely  sequenced  genomes  and  it  is  increasing  constantly. 


THE  WEB  SITE 


VirFact  is  publicly  available  on  the  web  at  iutp://viifact.biirnham.oi-g.  The  database  can  be 
browsed  by  virulence  factor  name,  PAI  or  genome  using  links  on  the  top  of  the  main  web 
page. 

-  the  “Virulence  Factors”  link;  lets  the  user  to  see  all  virulence  factors  deposited  in  the 
database 

-  the  “PAIs”  link:  allows  to  display  all  PAIs  that  are  contained  in  VirFact.  After 
selection  of  a  specific  PAI,  the  composition  of  PAI  proteins  is  shown. 

the  “Genome”  link:  leads  user  to  an  interface,  which  allows  to  check  all  VirFact 
proteins  that  are  encoded  in  selected  genome.  An  additional  feature  is  a  chart  showing 
genomic  regions  that  deviate  most  from  the  rest  of  the  genome,  which  could  form 
new,  as  yet  unrecognized  PAIs. 

For  each  displayed  vimlence  factor,  on  the  right  side  of  a  webpage,  there  are  links  to 
annotation  and  prediction  results,  to  sequence  in  FASTA  format  or  to  other  links  that  could  be 
potentially  useful,  like  to  NCBI  PubMed.  The  link  called  “Homologs”,  allows  user  to  view 
PSI-BLAST  (3),  FFAS03  (4)  or  T-Coffee  (5)  results.  PSI-BLAST  is  used  to  compare  a  query 
sequence  with  those  contained  in  non  redundant  protein  database  at  NCBI  by  performing  the 
iterative  BLAST  search.  It  is  the  most  sensitive  widely  used  program  for  recognizing 
homologs,  making  it  useful  for  finding  very  distantly  related  proteins.  The  “FFAS”  link 
shows  the  results  of  FFAS03  server,  a  profile-profile  alignment  algorithm  used  for  super- 


sensitive  recognition  of  distant  homologs  and  fold  assignments.  Finally,  links  called 
“Alignment”  and  “Tree”  leads  to  T-Coffee  results,  where  a  multiple  alignment  was  built  using 
proteins  found  by  the  PSI-BLAST  search.  The  T-Coffee  results  can  be  visualized  with  the 
“JalView”  (multiple  sequence  alignment  viewer,  6)  and  the  “A  Tree  Viewer  (ATV)” 
(phylogenetic  tree  viewer,  7)  applications  (Java  Virtual  Machine  is  required  by  both 
programs). 

The  “Genomic  Context”  interface  was  designed  to  perform  the  analysis  of  the  genomic 
context  using  The  SEED  system  (http://theseed.iichicago.edu/FIG/index.coi)  for  genome 
annotations.  As  described  by  Overbeek  et  al.  SEED  is  designed  to  help  a  researcher  study  a 
specific  subsystem  (set  of  genes),  supporting  community-wide  annotation  of  genomes  and 
searching  for  specific  missing  genes.  SEED  focuses  on  conservation  of  a  genomic  context 
between  homologs  of  the  specific  gene.  In  VirFact,  we  compared  genomic  context  of  close 
homologs  of  the  virulence  factor  being  studied.  It  is  important  to  note  that  SEED  uses  its  own 
definition  of  a  homolog,  typically  much  more  conservative  then  would  result  from  a  PSI- 
BLAST  search. 

The  VirFact  can  also  be  queried  using  the  Web-ba.sed  interface  called  “Scan”  for  a 
presence  of  homologs  of  virulence  factors  covered  by  VirFact  in  the  genome  provided  by  the 
user.  The  search  takes  some  time,  up  to  several  minutes,  depending  on  a  genome  size.  The 
output  page  shows  potential  virulence  factors  in  the  user  genome,  with  information  about  the 
similarity  score  to  known  virulence  factors,  the  position  on  a  genome  and  the  sequence 
alignment  to  the  “parent”  virulence  factor  in  the  FASTA  format.  For  example,  we  show  here  a 
short  analysis  of  Francisella  tularensis  genome.  In  the  example  presented  here  we  focus  on 
the  information  on  how  to  use  VirFact  website,  the  full  analysis  of  the  potential  virulence 
factors  in  F.  tularensis  genome  will  be  presented  elsewhere.  As  is  showed  in  the  chart  (Fig. 
1),  there  is  a  peak  around  45  kb  indicating  high  diversity  of  this  region  from  the  rest  of  the 
genome.  In  the  same  region  VirFact  found  a  protein  similar  to  “Z0262  gene  product”  of 
Escherichia  coli.  Further  analysis  indicates  that  this  hypothetical  protein  of  E.  coli  has  a 
homolog  described  only  in  the  case  of  Francisella  tularensis,  called  IglB.  The  last  protein  is 
acknowledged  as  associated  in  intracellular  growth  (8).  Moreover,  a  neighborhood  of  “Z062 
gene  product”  shows  the  functional  coupling  with  other  unknown  proteins  often  present  in 
other  pathogens. 


UPDATES 


Parsing,  annotation  and  data  updates  have  been  automated  to  minimize  human 
intervention.  The  VirFact  database  will  be  updated  at  least  once  per  two  months  to  ensure 
current  report  of  data.  The  information  about  PAIs  is  manually  curated. 


FUTURE  PERSPECTIVES 


VirFact  was  developed  as  a  relational  databa,se  of  PAIs  and  virulence  factors  for  the 
comprehensive  representation  of  pathogenicity  in  various  prokaryotic  organisms.  A  web 
interface  was  designed  to  easy  access  the  various  features.  To  our  knowledge,  this  is  the  only 
database  devoted  exclusively  to  pathogenicity  island  and  virulence  factors  that  provides  a 
variety  of  tools  for  data  analysis.  We  plan  to  expand  the  VirFact  database  to  incorporate  all 
annotated  PAIs  from  all  completely  sequenced  genomes  and  all  virulence-related 
genes/proteins  described  in  the  literature.  In  near  future  we  would  like  to  broad  VirFact  of 
new  tools  predicting  surface  regions  of  the  proteins  and  trans-membrane  regions.  We  believe 
the  VirFact  will  be  useful  tool  for  the  investigation  of  the  bacterial  virulence  and  for  the 
detection  of  virulence  factors  in  newly  sequenced  genomes. 


ACKNOWLEDGEMENTS 

We  would  like  to  thank  Dr  Ross  Overbeek  for  The  SEED:  an  Annotation/Analysis  Tool 
that  makes  possible  a  development  a  genome  context  part  of  VirFact  service.  The  authors  also 
thank  Zhanwen  Li  for  her  help  in  FFAS  calculations.  The  work  was  partially  funded  by  the 
6FP  grant  MicrobeArray  (to  LR)  and  United  States  Army  Medical  Research  and  Materiel 
Command  Grant  DAMD 17-03-2-0038  (to  AG). 


REFERENCES 

1.  Wassenaar,T.M.  and  Gaastra,W.  (2001)  Bacterial  virulence:  can  we  draw  the  line?  FEMS 


Microbiol  Lett.,  201,  1-7. 


2.  Tu,Q.  and  Ding,D.  (2003)  Detecting  pathogenicity  islands  and  anomalous  gene  clusters  by 
iterative  discriminant  analysis.  FEMS  Microbiol.  Lett.,  221,  269-275. 

3.  Altschul,S.F.,  Madden,T.L.,  Schaffer,A.A.,  Zhang,!.,  Zhang, Z.,  Miller,W.  and  Lipman,D.J. 
(1997)  Gapped  BLAST  and  PSI-BLAST:  a  new  generation  of  protein  database  search 
programs.  Nucleic  Acids  Res.,  25,  3389-3402. 

4.  Rychlewski,L.,  Jaroszewski,L.,  Li,W.  and  Godzik,A.  (2000)  Comparison  of  sequence  profiles. 
Strategies  for  structural  predictions  using  .sequence  information.  Protein  Science,  9,  232-241. 

5.  Notredame,C.,  Higgins,D.G.  and  Heringa,!.  (2000)  T-Coffee;  A  novel  method  for  fast  and 
accurate  multiple  sequence  alignment.  J.  Mol.  Biol.,  302,  205-217. 

6.  Clamp, M.,  Cuff,!.,  Searle,S.M.  and  Barton,G.!.  (2004)  The  !alview  !ava  alignment  editor. 
Bioinformatics,  20,  426-427. 

7.  Zma.sek,C.M.  and  Eddy,S.R.  (2001)  ATV:  display  and  manipulation  of  annotated 
phylogenetic  trees.  Bioinformatics,  17,  383-384. 

8.  Gray,C.G.,  Cowley,S.C.,  Cheung,K.K.  and  Nano,F.E.  (2002)  The  identification  of  five  genetic 
loci  of  Lrancisella  novicida  associated  with  intracellular  growth.  FEMS  Microbiol  Lett.,  215, 
53-56. 


Figure  1.  Graphic  illustration  of  the  using  the  VirFact  for  a  search  of  virulence  homologs  in 
the  genome  delivered  by  the  user.  The  chart  of  discriminant  scores  shows  a  region  that 
deviates  most  from  the  rest  of  the  genome.  The  VirFact  has  found  in  this  place  a  homolog 
similar  to  “Z062  gene  product”  of  Escherichia  coli.  The  PSI-BLAST  result  show  that  “Z062 
gene  product”  has  a  similar  sequence:  IglB  [Francisella  tularensis].  Moreover,  the  “Genomic 
Context”  interface  shows  a  significant  neighborhood  of  Z062  with  other  proteins  (in  table,  the 
“Z062  gene  product”  is  no.  1,  called  as  “hypothetical  protein”). 


